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Foreword 

Ever  since  the  publication  of  the  papers  by  Kalman  and  Bucy  in 
1960  and  1961,  the  engineering  community  has  made  giant  strides  in 
applying  their  ideas  concerning  linear  estimation  to  an  essentially 
uncountable  number  of  space  and  control  problems.  The  widespread  and 
almost  instantaneous  acceptance  of  the  Kalman-Bucy  filter  was  due  in 
part  to  a  number  of  factors:  the  then  common  frequency  domain 
synthesis  procedures  were  limited  to  time-invariant  (steady-state) 
problems,  the  advent  of  the  digital  computer  led  to  easy  simulation 
of  time  functions,  whereas  frequency  calculations  were  indirect  and 
less  convenient  than  "state-space",  and  most  important,  the  newly 
accelerated  man-in-space  program  led  to  an  immediate  real-time 
application,  the  most  significant  catalyst  for  engineering  developments. 

The  fact  that  most  applications  were  not  lipear  systems  did  not 
diminish  the  enthusiasm  for  the  new  linear  theory.  Almost  over  night, 
under  the  pressures  of  ever-present  engineering  deadlines,  a  host  of 
approximate  approaches  to  the  nonlinear  problems  were  generated  with 
many  variations  and  extensions,  but  most  of  which  were  patterned  to 
look  much  like  the  highly  successful  linear  estimator  of  Kalman  and  Bucy. 
In  some  applications  these  approximations  produced  highly  satisfactory 
results,  but  in  most  situations  a  second  design  phase,  less  publicized 
but  equally  important,  was  carried  out  in  order  to  eliminate  the 
anomalies,  instabilities  and  unexplained  peculiarities  of  the 
estimators.  A  seemingly  endless  stream  of  technical  papers  dealing 
with  adaptive  techniques  and  examples  of  clever  but  non-gene raliz able 
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"engineering  fixes"  for  the  multitude  of  undesirable  characteristics  of 
the  approximations  have  appeared  during  the  last  3- 1C  years,  resulting 
in  an  apparently  bottomless  bag  of  engineering  tricks  for  the  estimation 
engineer.  In  addition,  the  technical  access  to  the  original  articles 
was  evidently  unsatisfactory  since,  for  a  variety  of  reasons,  many 
authors  have  devoted  tutorial  articles  explaining  how  the  same  equations 
for  estimation  can  be  derived  from  a  >ariety  of  points  of  view, 
including  Bayes  rule  (the  original  concept),  least  squares, maximum 
likelihood  and  a  host  of  others.  The  results  would  fill  many  volumes 
of  "handbooks"  for  the  engineer  while  neither  answering  nor  posing 
the  questions  -  where  do  we  go  from  here?  or  ~  do  we  really  understand 
nonlinear  estimation? 

On  the  other  hand,  it  is  fair  to  say  that  nonlinear  estimation 
theory,  as  advanced  and  generalized  as  it  has  become,  has  not  been 
pub__  '.zed  and  advertised  as  a  general  panacea  for  the  estimation 
problems  of  the  future.  A  small  but  growing  contingent  of  university 
researchers  has  been  working  steadily  on  problems  involving  mathe¬ 
matically  subtile  concepts  such  as  stochastic  integrals,  Ito  calculus, 
diffusion  processes,  etc.,  and  a  fe1  elegant  and  mathematically 
sophisticated  theorems  have  been  proved  concerning  general  representa¬ 
tions  for  nonlinear  estimators.  But  the  very  difficulties  which 
frustrated  the  application  of  these  "solutions"  is  their  nature: 
estimates  are  determined  to  be  related  to  the  numerical  solution  of 
infinite-dimensional  partial  differential  equations  -  a  formidable 
task  in  the  simplest  of  problems,  and  thus,  understandably,  the  paths 
of  estimation  application  and  nonlinear  estimation  theory  have  continued 
to  diverge.  On  the  rare  occasion  that  representatives  of  the  two 
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factions  have  met  much  needless  antagonism  has  resulted.  While  it  is 
true  that  the  questions  most  frequently  asked  by  the  applications 
engineers  rarely  deal  with  the  optimality  of  estimates  but  merely  with 
the  computational  traceability,  and  vice  versa  for  the  theorists,  it 
appears  that  a  partial  reconciliation  on  both  sides  seems  likely  to 
provide  mutual  enrichment. 

Specifically,  what  is  proposed  here  is  that  all  interested  parties 
should  stand  back  for  a  moment  and  reflect  on  the  question  -  what, 
exactly,  are  the  characteristics  of  optimal,  nonlinear  estimators?  The 
answer  to  this  and  related  questions  provides  the  motivation  for  the 
work  described  in  this  paper.  We  ask  that  the  applications  engineer 
temporarily  suspend  his  seemingly  unattainable  computational  constraints 
and  that  the  estimation  theorist  pause  briefly  from  his  esoteric 
pursuits  involving  the  subtleties  of  measure  theory  and  look  for  awhile 
at  some  examples  of  optimal,  discrete- time,  nonlinear  estimators.  (As 
it  turns  out,  the  discrete-time  problem,  although  far  from  trivial,  is 
at  lease  tractable  for  solution  on  modern  digital  computers.)  In  this 
way,  it  is  expected  that  the  benefits  to  both  factions  will  be 
considerable,  and  perhaps  their  steady  divergence  may  be  curtailed. 

In  effect,  it  is  hoped  that  the  engineer  engaged  in  the  daily 
process  of  fitting  old  tricks  into  new  computers  will  begin  to  ask 
whether  or  not  the  basic  principle  ot  the  trick  is  applicable  to  the 
specific  application  at  hand  -  perhaps  a  simple  modification  or  another 
approach  would  be  much  more  satisfactory.  Such  realizations  are 
generally  obtained  at  the  expense  of  serious  time  delays  and  costly 
experimental  failures  -  perhaps  added  experience  with  optimal  nonlinear 
estimators  could  provide  more  effective  and  less  expensive  guidelines. 
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In  addition,  it  is  hoped  that  the  theoretician  can  appreciate  the 
reality  of  concrete  examples  and  thereby  refuel  his  fire  regarding 
theorems  concerning  the  characteristics  and  asymptotic  descriptions 
of  optimal  nonlinear  estimates.  Finally,  it  is  intended  that  certain 
applications  of  nonlinear  estimators  be  considered  for  which  the 
optimal  solution  can  itself  be  considered  practical,  if  not  via  present 
day  realizations,  then  perhaps  by  special  purpose  hardware  designed 
with  optimal  estimation  in  mind.  This  paper  has  been  written  to 
instill  some  enthusiasm  in  the  reader  for  all  of  these  expectations. 


Acknowledgements 


The  authors  are  deeply  indebted  to  the  generous  and  enthusiastic 
support  of  the  U.S.  Air  Force  and  to  the  National  Aeronautics  and  Space 
Administration.  The  research  described  herein  has  been  performed  at 
the  Frank  J.  Seiler  Research  Laboratory,  Air  Force  Systems  Command,  as 
the  University  of  Southern  California  Department  of  Aerospace  Engineering 
(Air  Force  Office  of  Scientific  Research  Grant  AFOSR-71-2141) ,  and  in 
conjunction  with  Electrac  Corporation  (NASA  Grant  NAS5-10789,  1970). 
Although  many  persons  must  be  acknowledged  regarding  this  research  and 
the  associated  experiments,  perhaps  the  two  most  important  figures  are 
Colonel  Bernard  S.  Morgan,  Jr.,  USAF,  and  Major  Allen  0.  Dayton,  USAF, 
who  had  the  insight  and  intuition  necessary  to  introduce  the  authors 
of  this  paper  to  each  other  in  the  first  place.  It  would  be  very 
fortunate,  indeed,  if  such  technical  interchanges  and  mutual  cooperation 
were  to  continue  to  be  encouraged  among  government  sponsored  and 
university  researchers. 


vii 


Table  of  Contents 


Page 

Publication  Notice  ii 

Foreward  iii 

Acknowledgements  vii 

Table  of  Contents  ix 

Table  of  Figures  xiii 

Table  of  Tables  xvi 

I.  Introduction  1 

References  6 

li,  Bayesian  Estimation:  The  Problem  9 

References  21 

III.  Finite  Dimensional  Approximations  23 

A.  Introduction  23 

B.  Orthogonal  Series  24 

1.  Least  Square  Polynomial  Approximation,  Scalar  Case  25 

2.  Least  Square  Polynomial  Approximation, 

Multi-dimensional  Case  31 

3.  Gauss-Hermite  Integration  34 

4.  Application  of  Polynomial  Expansions  to  a  Two- 

Dimensional  Filtering  Problem  39 

5.  Applying  the  Hermite  Expansion  50 

Preceding  page  blank 

ix 


C.  The  Point-Mass  Approximation  64 

D.  Non-Orthogonal  Series  -  Gaussian  Sums  68 

E.  Other  Computational  Methods  74 

1.  Fourier  Series  Expansions  74 

2.  Spline  Functions  75 

References  77 

IV.  Monte  Carlo  Analysis  Techniques  79 

A-  Introduction  79 

B.  Statistical  Analysis  80 

C.  An  Example  85 

D.  Conclusions  94 

References  96 

Appendix.  A  Machine  Independent  Random  Number  Generator  97 

A.  Introduction  97 

B.  An  Example  100 

V.  Parallel  Computational  Techniques  115 

A.  Introduction  115 

B.  Parallelism  and  Bayes  Law  115 

C.  Look-Ahead  Processors  122 

D.  Array  Processors  124 

E.  Associative  Frocesso’-s  125 

F.  Pipe-Line  Processor  125 

G.  Hybrid  Computer  Methods  127 

References  129 

VI.  A  Passive  Receiver:  Bearings-Only  Tracking  (AWACS)  131 


A.  Introduction 


131 


i»8i 


3.  The  Linearized  Estimator 

C.  Application  of  Nonlinear  Filtering 

References 

Appendix  A,  First  Monte  Carlo  Experiments  -  Point  Masses 
Versus  Linearized 

Appendix  B.  More  Recent  Experimental  Results:  Point 
Masses  Versus  Gaussian  Sums 
Appendix  C.  A  Movie  of  Conditional  Densities 

VII.  Example:  Optimal  onlinear  Phase  Demodulation 

A.  Introduction 

B.  The  Linearized  Filter 

C.  Application  of  Nonlinear  Filtering 
Referenced 

Appendix  A.  Numerical  Experiments  with  the  Phase- 
Locked  Loop 

Appendix  B.  Numerical  Experiments  with  The  Hermite 
Polynomial  Expansion 

Appendix  C.  Cyclic  Point-Mass  Experiments 
Appendix  D.  A  Fourier  Series  Experiment 
Appendix  E.  A  Movie  of  Conditional  Densities 

VIII.  Conclusions 
Bibliography 
Resumes  of  the  Authors 

Richard  S.  Bucy 
Calvin  Hecht 
Kenneth  L.  Senne 


136 

137 

140 

141 

147 

155 

183 

183 

184 
198 
205 

207 

213 

227 

243 

249 

267 

271 

275 

276 
2  U 
282 


xi 


/  Page 

Additional  Appendix  A.  n'A  Two-Dimensional  Point-Mass  Program 

for  the  Passive  Receiver  Problem  235 

Additional  Appendix  B.  A  Gaussian-Sum  Program  for  the  Passive 

Receiver  Problem  355 

Additional  Appendix  C.  A  Gauss-Hermite  Program  for  Implementing 

the  Two-Dimensional  Phase  Demodulator  379 

Additional  Appendix  D.  A  Point-Mass  Program  for  Implementing 

the  Interpolating  Version  of  the  Cyclic  Phase  Demodulator  419 

Additional -Appendix  E.  A  Fourier  Series  Implementation  of  the 

Cyclic  Phase  Demodulator  445 

Additional  Appendix  F.  Some  Unpublished  Conference  Papers 

Referenced  by  this  Report  465 

R.S .  Bucy,  "Realization  of  Non-Linear  Filters"  467 

R.S.  Bucy,  M.J.  Merritt,  and  D.S.  Miller,  "Hybrid 
Computer  Synthesis  of  Optimal  Discrete  Nonlinear 
Filters"  475 

C.  Hecht,  "Digital  Realization  of  Non-Linear  Filters"  505 

K.D.  Senne,  "Computer  Experiments  with  Nonlinear  Filters"  513 


<i . 


Table  of  Figures 


Chapter  III 
Fig.  1. 
Fig.  2. 
Chapter  IV 
Fig.  1. 

Fig.  2. 

Fig.  A-l. 
Chapter  V 
Fig.  1. 
Fig.  2. 
Fig.  3. 
Chapter  VI 
Fig.  1. 
Fig.  B-l. 


Hermite  Polynomial  Bayes-Law  Recursion 
Coordinate  Systems  for  Hermite  Expansion 

Example  of  a  Questionable  Monte  Carlo 
Cumulative  Average  Sample  Path 
Probability  Density  and  Distribution  of  the 
Asymptotic  (N-*°°)  Kolmogorov  Statistic 
Algorithm  for  Partitioned  Uniform  Generator 

Seri  "\1  Eva'  ition  of  a+b+c+d+e+f 
Maximally  Parallel  Evaluation  of  a+b+c+d+e+f 
Combining  Ser.'al  and  Parallel  Computations 


Typical  Passive  Receiver ^Geometry 


Typical  Geometry  of  the  "Old  Problem"  - 
Illustrates  Periodicity  of  Errors 
Fig.  B-2.  "New  Problem"  Geometry  without  Periodic  Errors 
Fig.  C-l.  Detection  Geometry  in  the  Presence  of  Multipath 
Reception 

Fig.  C-2.  A  Priori  Density  Resulting  from  Multipath 
Detection  Ambiguicy 


Page 

41 

58 


84 

91 

101 

117 

118 
121 

133 

149 

151 

156 

158 


xiii 


Fig,  0*3.  A  Typical  Sample  Path  Resulting  from  the 
Multipath  Detection  Ambiguity 
Fig*  C-4.  Absolute  Error  Performance  of  Optimal  and 

Linearized  Predictors  for  Multimodal  Problem 


Chapter  VII 


159 


181 


Fig.  1. 

Block  Diagram  of  Linearized  Phase  Estimation 

189 

Fig.  2. 

Discrete  P  Error  Variance 

195 

Fig.  3. 

Discrete  P22  Error  Variance 

19b 

Fig.  4. 

Discrete  P,  Error  Variance 

12 

197 

Fig.  5. 

Torus  Interpretation  of  Doubly  Cyclic  State  Space 

201 

Fig.  A-l. 

MSE  Performance  Summary 

209 

Fig.  A-2. 

Fourth  Moment  Divided  by  Three  Times  the  Squared 

Variance  for  the  Phase-Locked  Loop  Error 

211 

Fig.  B-l. 

Hermite  Expansion  Error  Summary  -  P(o)  =  4P(°°) 

218 

Fig.  B-2 . 

Cumulative  Statistical  Variance  -  P(o)  =  4P («>) 

219 

Fig.  B-3. 

Portion  of  Sample  Function  No.  6  - 

Pn(o)  =  0.3025 

220 

Fig.  B-4. 

Error  for  Sample  Function  No.  6  - 

Pn(o)  =  0.3025 

221 

Fig.  B-5. 

Error  Variance  for  Pjj(o)  =  “5*2  dB  Starting 

at  Pj  j  (o)  =  4Pn(“) 

223 

Fig.  B-6 . 

Cumulative  Statistical  Variance  for 

pH(o)  =  -5.2  dB 

224 

Fig.  C-l. 

Nonlinear  Filter  Summary  (Enlarged) 

231 

Fig.  C-2 . 

MSE  Improvement  of  Nonlinear  Filters  over 

Phase-Locked  Loop 

235 

xiv 


Fig.  C-3.  MSE  Difference  from  Ideal  Linear  Analysis  237 

Fig,  E-l.  A  Typical  Sample  Path  of  Densities  Evolving 

in  Time  250 


xv 


e  \SJsVf  '5*=  '  >'<  — 


Table  of  Tables 


Page 

Chapter  XI 

Table  1.  Model  for  Bayes  Rule  Conditional  Density 

Recursion  Formula  18 

Table  2.  Conditional  Density  Recursion  formulae  for 

Bayesian  Estimation  19 

Chapter  III 

Table  1.  Outline  of  a  Gaussian-Sum  Recursion  Procedure  70 

Chapter  IV 

/p-l\ *2 

Table  1.  The  Normalized  Standard  Deviation  21 — g —  1  as  a 


function  of  P  and  N  82 
Table  2.  Monte  Carlo  Moments  of  Gaussian  Generator  88 
Table  3.  Testing  the  Hypothesis  of  Gaussian  Moments  88 
Table  4.  Pr(K^  <  ^rom  Massey  90 
Table  5.  Results  of  Kolmogorov  Test  92 
Table  6.  Sampled  Correlation  Function  93 
Table  7.  Uncorrelatedness  Test  94 


Table  A-l.  Partition  Requirements  for  m=36  bit  random 

numbers  102 
Table  A-2.  Partition  Examples  103 
Table  A-3.  Initial  Sample  Path  -  Sequence  One  105 
Table  A-4.  Initial  Sample  Path  ~  Sequence  Two  106 


xvi 


Table  A-5.  Repeat  Characteristics  of  the  Generator  for 
each  Sequence 

Table  A-6.  FORTRAN-II  Coding  Examples 
Two-Piece  Generator 
Three-Piece  Generator 
Four-Piece  Generator 
Six-Piece  Generator 

Table  A-7.  FORTRAN-II  Coding  of  Gaussian  Generator 
Chapter  VI 

Table  A-l.  Monte-Carlo  Performance  of  the  Optimal  and 
Linearized  Predictors 

Table  A-2.  Monte-Carlo  Performance  of  Optimal  and 
Linearized  Filters 

Table  A-3.  Monte-Carlo  Confidence  Intervals  for  Predictors 

Table  B-l.  Monte-Carlo  Average  Sum  Squared  Error 

Performance  for  Predictors  -  Old  Problem 

Table  B-2.  Monte-Carlo  Averaged  Sum-Squared  Error 

Performance  for  Predictors  -  New  Problem 


Chapter  VII 
Table  1. 

Table  2. 

le  A-l. 
‘table  B-l. 


Summary  of  Continuous  Linearized  Kalman-Bucy 
Filter 

Summary  of  Discrete  Linearized  Kalman-Bucy 
Filter 

Confidence  Intervals  for  the  Linearized  Filter 
Numerical  Values  for  the  Computer  Simulation 


Page 

107 

108 
109 
11C 
111 
113 


142 

143 

144 

149 

132 


i85 

191 

212 

216 


xvii 


Table  C-l.  Monte  Carlo  Mod  2it  Error  Performance  Data  for 


the  Cyclic  Point-Mass  Estimates  233 

Table  C-2.  Monte  Carlo  Improvements  -  Cyclic  Point-Mass 

over  Phase-Locked  Loop  234 

Table  C-3.  Monte  Carlo  Difference  Between  Cyclic  Point- 

Mass  and  Ideal  Linear  234 

Table  C-4.  Timing  Estimates  238 

Table  C-5.  n/m  Constant  239 

Table  C-6.  n  Constant  240 

Table  C-7.  m  Constant  241 


xviii 


T*>  rr*f“V . 


1.  Introduction 

This  report  is  intended  to  serve  as  a  record  of  specific  experiments 
with  feasible  realizations  of  optimal  nonlinear  estimators.  In  addition, 
it  is  expected  that  future  research  along  r.he  lines  described  herein  will 
continue  to  result  in  increased  understanding  of  the  behavior  of  optimal 
estimates  and,  possibly,  in  come  guidelines  for  actual  realizations  in 
particular  applications. 

The  underlying  thread  of  continuity  connecting  all  segments  of  this 
research  is  Bayes-Law  (See  Chapter  II),  the  general  solution  to  the 
discrete-time  nonlinear  estimation  problem.  Bayes-Law  is  in  effect  the 
discrete  "representation  theorem"  (see  Bucy  and  Joseph  [7]).  Some  of  the 
earliest  attempts  to  realize  Bayes  Law  on  the  digital  computer  involved 
orthogonal  series  representations  of  the  conditional  densities  employing 
Gram-Chalier  series  [23],  or  Edgeworth  expansion  [22].  An  early  problem, 
however,  concerned  the  tendency  for  truncated  expansions  to  become 
predominantly  negative  resulting  in  unavoidable  numerical  instabilities. 

In  1969  Bucy  [4]  proposed  a  point-mass  approximation  to  density 
functions  which  involved  a  selection  of  important  points  on  a  "floating 
grid"  and  the  centering  of  mass  on  the  selected  points.  The  point-mass 
approximation  was  of  course  always  positive,  easy  to  implement,  and 
numerically  stable.  The  computational  burden  was  unfortunately  pro¬ 
hibitive  for  high  dimensional  systems,  however,  and  many  short-cuts  and 
simplifications  were  made  by  Bucy  and  Senne  [10],  [20]  in  order  tc  make 
the  computations  tractable.  A  passive  receiver  problem  was  introduced 
by  Bucy,  Geesey,  ?r.d  Senne  [6]  to  illustrate  the  concerts  of  point-mess 
approximation  and  the  associated  problems. 
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In  an  independent  effort  simultaneous  to  the  above  work,  Alspach 
and  Sorenson  [2],  [21]  pioneered  an  approach  based  on  a  nonorthogonal 
series  of  Gaussians  densities,  originally  chosen  to  be  set  down  in  such 
a  way  so  as  to  minimize  an  criterion,  but  subsequently  to  be  deter¬ 
mined  via  a  simple  approximation,  again  in  order  to  provide  a  short-cut 
to  the  otherwise  prohibitive  computations. 

In  1970  at  the  Nonlinear  Estimation  Symposium  in  San  Diego  Bucy  and 
Senne  [9]  and  Alspach  and  Sorenson  [2]  each  described  their  respective 
approaches  to  the  Bayes-Law  computations,  thereby  providing  the  impetus 
for  a  sizable  new  interest  in  the  computations  associated  with  nonlinear 
estimation. 

During  the  last  two  years,  a  multitude  of  topics  related  to  Bayes- 
Law  computation  have  emerged.  Edison  Tse  [24]  has  noted  a  link  between 
the  previous  two  methods  involving  a  Fourier  transform  translation 
theorem  due  to  Wiener  [26].  Julian  Center  [11]  has  observed  the  relation¬ 
ship  between  generalized  least-squares  projection  and  series  expansions 
of  density  functions.  Hecht  [13]  [14]  has  taken  a  much  closer  look  at 
orthogonal  polynomials  -  notably  Gauss-Hermite  expansions.  Bucy,  Merritt, 
and  Miller  [8],  [18]  have  discussed  hybrid  solutions  to  reduce  the  serial 
computational  burdens  of  Bayes  Law  by  substituting  the  natural  parallelisms 
of  hybrid  computers.  Another  promising  approach  to  the  approximation 
problem  involving  generalized  splines  has  been  studied  by  deFigueiredo 
and  Jan  [12],  [15],  while  Weinert  and  Kailath  [25]  have  been  relating 
splines  to  least-squares  approximation  projection.  Thus  the  subjects  of 
numerical  methods  and  optimal  nonlinear  estimation  are  now  firmly  entrenched. 


Meanwhile,  still  another  practical  application  of  Eayesian 
estimation  has  recently  been  studied  by  Mallinckxodt,  Bucy,  and 
Cheng  [17],  by  Hecht  [14],  by  Bucy  [3],  and  by  Senne  [19],  and  is 
reviewed  in  the  current  report.  The  raw  application  involves  demodulation 
of  phase-modulated  signals  observed  in  additive  white  noise.  Since  the 
nominal  engineering  solution  to  such  problems  involves  the.  well-known  and 
reliable  phase-locked  loop,  it  appears  that  the  demodulation  problem  will 
continue  to  provide  an  important  comparison  between  moment  series 
approximations  and  numerical  density  approximations. 

Moment  series  approximations  are,  of  course,  the  most  commonly 
encountered  nonlinear  estimates  in  engineering  practice  today.  The 
simplest  moment  approximation  has  been  referred  to  by  the  names  "extended" 
or  "linearized"  Kalman-Bucy  filter  [7],  The  appeal  of  such  methods  is 
highly  warranted  in  many  problems,  since  the  non? inearities  are  not 
severe  (i.e.  they  don't  have  arbitrarily  large  derivatives),  and 
frequently  the  assumption  of  Gaussian  noises  is  adequate.  Whenever 
either  or  both  of  the  "well-behaved"  assumptions  Is  false,  however, 
considerable  controversy  has  resulted.  Some  have  advocated  higher- 
order  moment  expansions,  [3],  others  have  proposed  adaptive  noise 
tracking  techniques  [16],  or  finite- memory  filters  [16],  but  generally 
nobody  seems  to  ask  the  most  fundamental  question:  What  characteristics 
of  the  filtering  problem  have  led  to  the  demise  of  the  .simple  first- 
order  method?  Or,  equivalently,  how  would  an  optimal  estimate  behave 
in  such  a  situation?  The  answers  to  these  and  other  yu .-.cions  are 
directly  addressed  in  the  present  report. 
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The  existing  organization  of  the  report  was  necessitated  due  to 
time  constraints,  and  although  many  different  topics  are  addressed, 
there  is  occasionally  some  duplication.  The  attempt  was  to  assemble 
a  chronolog  of  the  more  significant  results  of  the  authors  during  the 
past  three  years  into  one  source,  thereby  providing  a  focal  point 
for  subsequent  research  in  the  field.  We  apologise  beforehand  for  any 
unavoidable  difficulties  for  the  reader  caused  by  the  presental ion  of 
the  material.  The  global  organization  of  the  chapters  is  as  follows: 

Chapter  II  contains  a  simplified  summary  of  a  derivation  of  the 
principal  Bayes-Law  formulas  us^d  throughout  the  report.  The  presen¬ 
tation  is  taken  principally  from  the  dissertation  of  Hecht  f 14 ] . 

Chapter  III  provides  a  background  for  the  various  proposed  numerical 
representations  of  the  conditional  densities.  Covered  in  greatest  detail 
are  the  orthogonal  series,  exemplified  by  Gauss-Hermite  and  Fourier, 
and  the  point-mass  representation  of  Bucy  and  Senne  [10].  Discussed 
in  lesser  detail  are  the  nonortnogonal  series  (such  as  Gaussian  sums) 
and  generalized  spline  functions. 

In  Chapter  IV  the  important  topic  of  Monte  Carlo  simulation  is 
treated  in  considerable  depth.  In  particular,  the  subject  of  ex¬ 
perimental  confidence  is  treated  in  great  detail  and  an  example  of  the 
analysis  is  given  involving  the  testing  of  the  gaussian  random  number 
generator  (which  is  realizable  on  any  binary  computer)  to  determine 
its  statistical  properties.  Thus  the  reader  is  left  with  a  complete 
understanding  of  the  experimental  methods  used  for  this  report. 

Chapter  V  describes  another  important  side-light  of  the  current 
investig? cion  -  computer  architecture.  The  concepts  of  parallelism 
and  asynchronous  computation  are  introduced  and  the  Bayes  estimation 
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problem  is  interpreted  in  light  of  parallel  digital  architecture. 

At  first  the  "ideal"  machine  is  postulated  for  computation  of  Bayes  law. 
Then,  as  allowance  is  made  for  technical  feasibility  and  current 
computer  architecture,  observations  are  made  concerning  efficient  use 
of  such  structures  as  array  processors  (like  Illiac  IV),  pipe-line 
machines  (like  CDC  Star),  look-ahead  machines  (like  CDC  6600  or  ?'.(J0), 
associative  processors  (like  Goodyear's),  and  multi-processorr.  (like 
the  Burroughs  D-Machines).  Finally,  some  consideration  is  given  to 
the  currently  available  hybrid  computer  systems  -  examining  their 
intrinsic  parallelism. 

Chapters  VI  and  VII  provide  the  details  concerning  the  two  examples 
studied  in  this  work.  The  first  example  deals  with  a  receive-only 
tracking  system  or  a  passive  tracking  receiver,  which  attempts  to 
locate  a  target  on  the  basis  of  bearing  information  imbedded  in  additive 
noise.  The  problem  description  is  ver’-  similar  to  the  Airborne  Warning 
and  Command  System  (AWACS),  which  is  currently  under  contract  development 
for  the  Air  Force.  The  other  example  deals  with  phase  demodulation, 
whereby  a  phase-modulated  signal  is  observed  in  additive  noise  and 
it  is  desired  to  retrieve  the  original  message  process  -  at  least 
modulo-27r. 

Chapter  VIII  contains  a  brief  conclusion  concerning  the  Bayes 
estimation  research  and  indicates  some  paths  for  future  developments. 

The  appendices  provide  documentation  on  some  of  the  computer 
programs  used  and  some  unpublished  technical  papers  relevant  to  the 


current  research. 
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II.  Bayesian  Estimation:  The  Problem 


Although  the  equations  for  Bayesian  estimation  are  relatively  well 
known,  having  been  derived  for  example  by  Bucy  [1],  [2],  a  modified 
derivation  is  included  in  this  chapcer  for  the  sake  of  introducing 
relevant  notation  and  to  make  the  present  exposition  as  self-contained 
as  possible.  This  presentation  is  taken  from  Hecht  [3]. 

The  discrete-time  process  and  measurement  equations  may  b°  written 
as 


-  »<Vi> 


a  ,  (x 
n-1  -n- 


l}Vl 


(1) 


x  =  c 


2  =  h(x  )  +  V 

-n  —  n  -n 


(2) 


Equation  (1)  represents  a  discrete  time  signal  process  with  x  a 

— n 

sequence  of  d-dimensional  random  vectors;  the  subscript  n  refers  to 

time.  <t(x  ■,)  is  a  function  from  to  R^,  o(x  a  function  from 

-n-1  -n-1 

R  to  d  x  r  matrices.  The  process  {u  }  .  is  a  set  of 

r  -n  n=l, . . . 

independent  r  -  dimensional  random  vectors  with  density  p  (w) .  The 

n 

random  vector  £  is  d-dimensional,  independent  of  the  u^  process, 
and  has  density  Pc(x) . 

Equation  (2)  represents  the  observation  process,  with  z  a 
sequence  of  s-dimensional  random  vectors,  hCx^  a  function  from  Rc 

g 

to  R  and  {v  }  a  set  of  independent  s-dimensionai  random 

~n 

n-i, • « * 

vectors  with  density  p  (0),  independent  of  _c  and  the  u  process. 

n  ~n 

The  filtering  problem  consists  of  determining  the  conditional 

density,  given  as 

Preceding  page  blank 
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J  i..(y)dy  =  P(X  edyjz  =  z^,...,z  =  z  ) 

n  1 1  -n  — t  — t  —o  —o 


Results  are  stated  in  the  same  notation  as  Bucy  [2],  The  following 
notation  will  be  used  in  discussing  the  derivation  of  the  conditional 
densities. 

Underlined  lower-case  Latin  letters  denote  the  name  of  random 
variables  or  random  vectors,  and  related  Greek  letters  represent  the 
dummy  argument  associated  with  the  density  or  distribution  functions. 
p(*)  -  probability  density  function 
P(*)  =  probability  distribution  function 
(The  above  functions  are  referred  to  briefly  as  "density"  and 
"distribution"  respectively.) 


Thus,  for  example, 


p  (£  )  =  the  density  function  of  the  random  vector  x^,  with 

xn  "~n 

£  as  dummy  argument. 

■*n 

P  (€  |z  =C  )  =  the  conditional  density  of  the  random  vector  x  , 

x  -n'-n  -n 
n 

given  the  random  vector  z  has  taken  on  the  value 


p  f  z  ■;  is  a  function  of  £  and  c  • 

*x  Vin'-n  — n  *n  ~n 

n 

The  above  is  frequently  abbreviated  to  p^  (J^lz^-J^)  =  Px 

n  n 

with  the  argument  C  implied,  p^  x  *  the  joint  density 

~n  n,  n-1 

of  the  random  vectors  x  and  x 

-n  -n-1 

Using  the  coove  notation  (3)  is 

. vV  •  <' 


It  is  noted  in  Bucy's  original  work  that  the  signal  process  is  a  Markov 
process  with  transition  density 


J-'H  »*»«V- -»'»•-  » 
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The  required  recursion  relations  for  the  conditional  density  ^n|n  are 
given  by  the  following  equations. 

Jn|n(V  "  K(n)  Pv  ]fdfVau  ^“*^n-l^  Jn-l|n-l(4i-l)d4i-l*  (6) 

1  n  ~  n  1 

with 


Jo|o(‘^  K(o)  Pv0^~-^^Pc^X^  ’ 

(7) 

K(0)  =  P^^)  . 

(8) 

K(n)  “  Pz  _!»--•  .z^j)  » 

n 

(9) 

Vr|„<W  ’  fdf\^k-*r' WJn| „<Vd4  - 

(10) 

J.H-iint^n+l)  "  K(n)  ipvi]I-Sn"-<4i)  1  Jn| 

n-l(Vd4(11) 

where  previously  undefined  symbols  have  the  following  meaning! 

M  •)djc  =  JdJ(')dx^t . . .  ,dx^  , 

=  d  integrations 

the  density  of  the  d-dimensional  random  vector 


and  p  (•) 
ou  , 
n-1 


a(x  ,)u 
-n-1  -n-1 


The  derivation  oi  (6.'  through  (11)  follows: 


Jo|o<*c)d*c 


P  (-  Z  = 
*  xQ  —0  0 


V 


PZo(^|xo=io>Px0(V 


(12) 


by  Bayes  rule. 
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Pz  (jLg|.2g=£g)  corresponds  to  the  distribution  P(z^  £  £g| Xg®!^) 

■  r&<V%  i^W 

■  Pt»o  <  io-iti,)] 

which  corresponds  to 

Pv/V^V1'  ot  Pz/^VV-Pv^V^V1  •  H3) 


Substituting  (13)  into  (12)  gives 

1 


P„  (£jz^.£j  “ 


x„  ■i-0'£<f-V  ”  pz  (£g)  Pv0 


kToypv0[(V^(V]pc(^ 


which  is  (7).  Next,  using  relations  of  conditional  and  joint  densities, 

Jn|n(in)d4i  =  Px  <5JV  *  *  *  ’V 

1  n 

Px  , z  ,...,zn(V'5n . V 

n  n _ 0 _ 

pz  ,...,zn(in . V 

n  u 

Px  ,2  ^n,5nlVl . VP*  . . z.(Vl . i«) 

pz  <-finlVl’**,,£0)pz  ..... ,  z„ (- in-1 . V 

n  n-1  0 

=  k(To  Px  ,z  (4i,-UVi,-,*>-5o) 

n  n 

K(n)  ^/Px  ,z  ,x  ,  ^n’-^n’^n-l^-Ti-l’  **  ’  '^O^-^n-l 
n  n  n-1 


Now  operating  on  the  integrand, 


I 


14 

P(4  -UVl>  "  p(«Vl,+0<Vl>Vl  -^JVl1 
'  PtotVi,Vi  i  VKVi,^-i) 
■n»UVii^(W  <20> 

the  corresponding  density  is  obtained, 


(UVi! 


pa(x  .  )u 


(21) 


Putting  (18),  (19),  and  (21)  into  (14), 


Jn|„«„>d£„ 

=  K(n)Pv  >u  [4i_<!,(4i-l)]Jn-l|n-l(4i-l)d4i-l 

n  "  *'  n-1;  n-1  1 


which  is  (6) . 

Making  the  substitutions  L^Yj  assuming  the  u^  and 

processes  are  zero  mean  gaussian  with  variances  Q  and  R,  and  using 
the  notation  N(£,A)  for  gaussian  density  with  covariance  A  and 
argument  5,  (22)  becomes 

Jn|n(^  =  K^O  (23) 

Equations  (6),  (22)  and  (23)  are  referred  to  as  the  conditional  filter 
density  update  equations  (for  the  gaussian  special  case). 

Continuing  with  the  development, 

dn+r|n^^n+r^  Px  ,  ^n+r^^n’  ‘  ’  * 

1  n+r 

=  fdfPx  .  ,x  (Vr,inl^n’,,,,-50)d-£-n 

n+r  n 

=//Px  +  UV2n’",,VPx  (^n!^n . Vd^n 

=//PX  ^  (WVP*  . -2o)d4, 

*  J  n+r  n 


(24) 


by  the  Harkov  property. 


Vr|n<W  (25) 

which  is  (10). 

No ting  p  =  Pff(x  )u  from  (21)’  and 

n+1  n  n 

substituting  into  (25) , 

Jn+l|n<4i+l>  W«4)1J»|n«n)din  (26) 

then  substituting  (22)  for  Jn|n(^n)d^n 

•  jj^  #0  (Vl)VlIJ”'*(S»-l)  Jn-l|  „-l<4-l>dVl  Jd4 

using  (21), 

=fdfP°(x  )u  t4+l'<f(4i)]  K00  Pv  t-5n“-(4i)  ] 
v  n  n  n 

•  J  d4 

■  (27> 

by  again  using  (25) .  Equations  (27)  and  (11)  are  identical. 

By  substituting  yfin+^>  x3^  and  assuming  the  same  gaussian 
conditions  following  (22),  (27)  becomes 

Jn+l|n(i)  =  ic^n)"  fdfi^  >(x)  »oQo '  JN[z-h(x)  ,R] Jnj ^(xjdx  (28) 

Equations  (11),  (27),  and  (28)  are  referred  to  as  the  conditional 
one-step  predictor  density  update  equations. 


--V  -  «  '  "  is- 


Relations  between  the  one-step  predictor  and  the  filter  density 
update  equations  may  also  be  written  as  follows. 

Use  (26)  for  the  one-step  predictor  in  the  filter  equation,  (22), 


to  obtain 


Jn|n(^  =  K(n)  Pv  Jn|n-l('4i) 


and  use  (29)  in  (27)  to  obtain 


Jn+l|n<‘^n+l')  K(n)  3Jn|  n(^n)d4 


Tables  1  and  2  summarize  the  above  results. 

For  completeness  the  smoothing  filter  is  also  included,  although 
these  results  were  not  used  in  this  research. 


px  (4-r^,,,-,-50)  =  VrlA-P 


px  ,z  . z„(W4 . V 

n-r  n  0 


pz  ,...,z  <V“'’Vr+l,Sn-r’*' 

n  -n-r+l 


’V  Pz  ,...,zn(-in-r’,*,,i0) 
n-r  ’-0 


The  first  term  in  the  denominator  is  abbreviated  c(n,r). 
px  )=^7x)JTJ 


p  yr>2n»  •  •  • » y  y  •  •  •  . 

P  ( £  , . .  •  >  Cj-i) 

z  , .... z.  -n-r  -0 


d5  . . .  d£ 

•^n  -^n-r 


P  „  (JL,  >  5  »  •  •  •  »_£n) 

_1 _  /Yxn-r,2n-r,,,,,';0~n  ^ 

'(.n,v)JVJ?z  ,...,z/in-r . V 

n-r  3 


n-r+i  n  n-r+l 


(4 . 4i-rH,Sn . 4i-rtl^-r~in-r) 


•  “in  -  “Vrfl 


r*«*r' 


(x)dx 


Note:  The  index  n  has  been  omitted  in  some  places  for  notational  simplicity. 
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III  Finite  Dimensional  Approximations 
A.  Introduction 

The  formal  Bayes-Law  calculations,  reviewed  in  detail  in  the 
previous  chapter,  are  functionally  simple  but  their  implications  to 
computation  are  formidable.  To  begin  with,  the  representation  of 
densities  is  in  general  an  infinite-dimensional  problem.  Although 
there  are  many  examples  of  density  families  characterized  by  a  finite 
number  of  parameters,  the  Bayes-law  computation  rarely  reproduces 
another  member  of  the  same  family.  The  most  well-known  exception  is 
the  Caussian  family,  which  is  reproduced  under  linear  transformations, 
resulting  in  the  widely  used  Kalman-Bucy  filter,  which  optimally 
describes  the  Bayes-law  computation  in  terms  of  linear  differential 
(difference)  equations  fc»“  the  mean  and  covariance  of  the  Gaussian 
conditional  densities,  provided  that  the  underlying  physical  system 
is  described  by  linear  differential  (difference)  equations  with 
additive  Gaussian  inhomogeneities.  If  the  physical  plant  is  not 
linear  or  the  disturbances  are  not  Gaussian,  however,  the  Bayes-law 
rarely  leads  to  a  reproducing  family  of  densities.  Thus  an  improvised 
or  unnatural  parameterization  of  the  densities  is  required  in  order  to 
implement  Bayes-law  and,  in  general,  an  infinite  number  of  parameters  is 
required  for  exact  representation.  The  numerical  approximation  problem, 
then  is  simply  s'eted:  how  do  we  choose  an  appropriate  (finite) 
subset  of  the  parameters  to  represent  a  given  collection  of  conditional 
densities?  Since  the  answer  to  this  fundamental  question  depends 
heavily  on  the  densities  in  question,  it  turns  out  that  little  can 
be  said  about  the  applicability  of  a  given  approximation  without 
discussing  specific  example  problems.  Even  for  a  given  problem  there 

Preceding  page  blank 
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may  be  several  different  appropriate  numerical  approximation  schemes, 
depending  upon  the  problem  parameters.  Examples  of  these  dependences 
will  be  discussed  in  Chapters  VI  and  VII. 

In  the  present  paper  we  will  discuss  some  representatives  from 
several  categories  of  density  approximations  as  well  as  their 
Bayes-law  implementations  and  associated  difficulties.  First,  we 
discuss  orthogonal  series,  covering  a  candidate  with  unbounded  support 
(Gauss-Hermite  polynomials  -  Section  B)  and  also  a  candidate  with  compact 
support  (Trigonometric  series  -  Section  E) . 

Next,  we  discuss  an  approach  involving  nonorthogonal  polynomials 
which  is  intended  to  provide  positive  density  approximations  for  any 
finite  number  of  terms  (Section  D) .  Thirdly,  we  discuss  an  intuitively 
simple  approach  to  density  approximation  involving  point  masses  (Section 
C) ,  suitably  distributed  so  that  most  of  the  probability  is  adequately 
covered  by  a  small  number  of  discrete  points.  Finally,  we  describe  a 
relatively  recent  additional  approach  to  the  problem  using  numerical 
spline  functions  (Section  E) .  The  presentation  of  this  paper  is  intended 
only  to  be  representative  and  not  exhaustive,  since  there  are  many  approaches 
to  numerical  approximation  which  have  yet  to  be  considered. 

B.  Orthogonal  Series 

The  general  theory  of  orthogonal  series  may  be  found  in  many 
places  in  the  literature  (see,  for  example,  [ 10 ] ) .  In  this  section  we 
intend  only  to  provide  two  contrasting  examples,  whereby  we  illustrate 
the  significance  of  the  type  of  state-space  required  for  the  applica¬ 
tion  at  hand.  If  the  state  vector  can  take  on  all  values  in  Rn 
with  positive  probability,  then  orthogonal  polynomials  must  be  used 
to  provide  the  necessary  approximation.  On  the  other  hand,  if 


the  state-space  actually  is  or  can  be  approximated  as  compact  then 
a  periodic  function  (such  as  trigonometric)  might  profitably  be 
employed  as  a  basis  for  an  orthogonal  series.  It  may  happen  that  a 
given  problem  may  be  interpreted  in  either  way  (see,  for  example, 
Chapter  VII),  so  that  more  than  one  orthogonal  series  may  be 
appropriate,  depending  on  the  performance  desired. 


n 

f(x)  »  y(x)  ak0k(x)  (1) 

where  0q(x), . . . ,0^(x)  are  the  required  polynomial  functions.  The 
approximation  is  to  be  the  best  in  the  sense  Lnat 


(f (x)-y(x) ]2dx 


D 

=  J  w(x)[f(x)-£  ak0k(x)]^dx  =  Minimum 


with  w(x)  a  specified  weighting  function  which  is  assumed  non-negative 
on  the  interval  (a,b). 


Equation  (2)  imposes  the  condition  on  the  coefficients  a^. 


-i  fw(x)[f(x)-  £  a,  0.  (x)]2d;<  =  0  (r=0,l, . . .  ,n) 

r  J  a  k=0  K  K 


from  which 


n  b  b 

^0  ayJ™  (x)  0r  (x)  0^  (x)  dx  =/aw(x)0r(x)f(x)dx  (r=0,l, . . . ,n)  (4) 


The  coordinate  functions  are  chosen  to  be  orthogonal  to  each 
other  over  the  interval  (a,b)  with  respect  to  the  weighting  function 
w(x). 


r(x)0r(x)0k(x)dx  =  0  r^k 


The  "uncoupled"  equations  reduce  to  (omitting  the  argument  x) 


o  D 

*  /  =/  w0  fdx 

va  r  */a  r 


(  w02dx 
a  r 


To  construct  the  polynomial  functions  0q(x),0j(x),  . ..,0r(x),  it 
is  required  that  the  polynomial  0f(x)  be  orthogonal  to  all  polynomials 
of  degree  inferior  to  r,  over  the  interval  (a,b)  with  respect  to 
the  weighting  function  w(x). 


\  *r  '•  • 


v 


ytW(x)0r(x)qr_1 


(x)dx  =  0 


where  q^is  an  arbitrary  polynomial  of  degree  r-1  or  less..  The 


notation  is  introduced 


drU  (x) 


W(x)0  (x)  =  r  »  t  f)(x) 

r  .  r  r 


so  that  (7)  becomes 


/ur<r)wVl(x)dx  =  0 


After  r  integrations  by  parts 


^r^Vr11^^  ^-l+*”  +  <“l)r“\q^1>]b  -  0 

a 

The  requirement  that  0r(x)  be  a  polynomial  of  degree  r  implies 
that  Ur(x),  from  (8),  satisfy  the  differential  equation 


r+1  w(x) 


.  drU  (x) 
1  r 
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in  (a,b)  whereas  the  requirement  (9)  be  satisfied  for  any  values  of 

qr-l^’  qr-l^’  qr-l^’  q(r-l)^’  etc‘  leads  t0  the  boundary 
conditions 


U  (a) 

=  u'(a) 

=  ILCa)  -  ...»  t/r-1)(a)  =  J 

r 

r 

r  r 

ur(b) 

=  u'  (b) 

r 

= . =  l/r_1)(b)  =  0 

r 

(ID 


For  each  integer  r,  a  solution  of  (10)  which  satisfies  the  bound- 

tVi 

ary  conditions  (11),  is  the  r  member  of  the  set  of  polynomial  func¬ 
tions,  0^(x),  given  by 


0rCx)  = 


1 

W(x) 


drUr(x) 


dx 


(12) 


The  numerator  to  compute  the  coefficient  a^,  given  by  (6)fis  a 
function  of  f(x).  The  denominator,  designated  yr>  is  independent  of 
f  and  need  be  computed  only  once. 


/  2  r  hr 
y  =  f  w (x)0^(x)  =  (-1)  r!Ar /Ur(x)dx 

r  J  a  J a 


(13) 


where  A^is  the  leading  coefficient  of  0r(x). 


0  (x)  =  A  xr  +  A  ,xr  1+ 
r  r  r-1 


+  A 

o 
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It  is  shown,  for  use  in  the  integration  formulas,  that  if  w(x) 
does  not  change  sign  in  (a,b),  the  polynomial  0^(x)  possesses  r 
distinct  real  zeros,  all  of  which  lie  in  the  interval  (a,b). 

For  application  to  the  problem  of  Chapter  VII  of  this  paper  we 
want  to  approximate 

f(x)  =  y(x)  =  v(x)  ]T  br0r(x)  (14) 

r=0 

such  that 


/ “<*>  -£ob^  ]  ^  "  "lnl""“ 


(15) 


which  leads  to  the  result 


1  r 

b  =  —  [  -*0  d* 
*  Yr  /  v  r 


(16) 


which  is  equivalent  to  minimizing  the  squared  error  (f-y)  with  re¬ 


spect  to  the  weighting  function  In  the  application  we  let 

v 


w  =  v  =»  e 


2  2 
-a  x 


(17) 


and  the  interval  (a,b)  is  (-O0»“)>  which  leads  to  the  Hermite  formulas. 
For  the  above  choice  of  w(x) 


0  00  =  e 


2  2  drU 
a  x 


dx 


r 

r 
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where  satisfies  (equation  (10)) 


2  2  ,r„  ' 
ax  d  U 

e  _ I  =  0 

L  HvrJ 


and  from  the  boundary  conditions,  (11)  requires  and  its  first  r-1 
derivatives  to  tend  to  zero  as  x-Hf5 
The  function 


U  (x)  =  C  e 
r  r 


2  2 
-a  x 


has  the  property  that  its  r  derivative  is  the  product  of  itself  and 
a  polynomial  of  degree  r.  It  therefore  satisfies  (18)  and  the  boundary 
conditions. 


0  (x)  =  C  e 
r  '  r 


o  o  r  2  2 

<ir  .or”  *  > 


The  Hermite  polynomial  of  degree  r  is  defined  by  taking 


C  =  (-l)r  and  a^  =  1. 
r 


2  ,r  _  2 

H  (x)  =  (-l)V  (e  )■ 
r  dx 


Cr  =  (-a) 


0r(x)  =  Hr(ax) 
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gives  for  the  coefficients 
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(r^-OQ,  01,10,11,02 . nynp 


The  polynomial  functions  are  the  same  ones  used  for  the  scalar 

case.  The  denominator  of  (29)  is  given  by  y  y  where  v  -v  as 

rl  r2  r  r 

given  by  (13).  i 

The  two-dimensional  approximation  that  is  needed  is 


f(xlx2)  =  y(xlx2J  =  vl(xl)v< 


\  ®2 

^VE  E  br  r  0  (x  >0  (x 

^=0  r9=0  12  1  1  r2 


where,  analogous  to  the  scalar  case, 


I 


p 
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and  using  the  results  of  the  scalar  case, 


yCx^)  = 


2  2  2  2 
-a.x.  ~a0x0 

11  l  Zr-*1 

e  e  Yj 
ri=0 


jS 

V  b  H 
r^=0  rir2  rl 


(a1x^H. 


r2<V2> 


(31) 


<») 


ala2 


rlr2  r  r 

2  V2 


if 

*/— oof  —a 


f(x1x2)Hr  (a1x1)Hr  (a2x2)dx1dx2 


(32) 


3.  Gauss-Hermite  Integration 

The  results  given  here  are  developed  in  Hildeorand  [9]  for  in¬ 
tegration  of  a  function  of  one  variable.  A  simple  extension  is  made 
here  for  functions  of  two  variables.  The  parts  of  the  theory  presented 
are  limited  to  those  needed  to  explain  the  methods  used. 

The  values  of  a  function  f (x)  are  assumed  known  at  m  points, 
x=x^,x2, . . .  jX^.  If  the  value  of  the  derivative,  f'(x),.is  known  at 
the  same  points,  a  polynomial  of  degree  2m-l  can  be  constructed  to  agree 
with  f(x)  at  the  m  points.  An  integration  formula,  using  Lagrangian  in¬ 
terpolation,  would  give  a  numerical  integration  equation  with  a  degree 
of  precision  of  (2m-l)*.  Use  cf  Gaussian  integration  requires  the  m 
points  to  be  selected  in  a  certain  way,  but  leads  to  formulas  which  have 
the  same  degree  of  precision  without  the  requirement  for  knowledge  of 
the  derivatives  at  the  m  points. 

*An  integration  formula  which  yields  exact  results  when  i (x)  is  a 
polynomial  of  degree  r  or  less,  but  fails  to  give  exact  results  for  at 
least  one  polynomial  of  degree  r+1,  possess  a  degree  of  precision  of  r. 
(Hildebrand  [9]. 
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Use  is  made  of  the  auxiliary  functions 


7i (x)  *  (x-x  )  (x-x)  ...  (x-x  ) 

12  m 


^(x) 


Cx-x^1  (xi) 


with  the  properties 


77  (x^ )  =  0 


i. (x.)  =  6..  (Kronecker  delta) 

i  3  ij 


Let  y(x)  be  a  polynomial  of  degree  2m-l,  which  agrees  with  f  and  f'  at 
the  m  points.  It  can  be  expressed  in  the  form 


y«  -|>kWf<V 


where  h^  and  h^  are  polynomials  of  maximum  degree  2m-l.  Using  the 
properties  of  the  auxiliary  functions  the  polynomials  h^  and  h^  are 


given  by 


hi(x)  =  [l-2£i(x)(x-x1)]t£i(x)]' 


h^x)  -  (x-xjL)f£i(x)] 

Integration  of  f(x)  times  3  weighting  function  w(x),  using  the 
approximation  y(x),  Equation  (35)  gives 
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b 

(x) f (x) dx  =Yj  + 


+  E 


(37) 


with  the  weighting  coefficients  defined  by 

br 

wh.dx 


h-f 

i  J  w 


U 

ih  =^*  wii^dx 
a 


and  the  error 


f <2m)  (y)  f  2 

E  "  ~(Zm)l  J  w(x)[ir(x)]  dx 


(38) 


(39) 


(40) 


with  y  some  point  on  the  interval  (a,b). 

Using  (33),  (34)  and  (36),  Equation  (39)  can  be  expressed  as 

b 

jj  -  — -  i w(x)7r(x)£l.  (x)dx  (41) 

*  (xi)  i  1 

so  that  will  vanish  if  n(x)  is  orthogonal  to  £^(x)  over  (a,b) 
relative  to  the  weighting  function  w(x).  The  points  x  ,...,xm  are  the 
m  zeros  of  ir(x).  Each  £^(x)  is  a  polynomial  of  degree  m-1,  so  the 
requirement  that  n(x)  be  orthogonal  to  all  polynomials  of  degree  in¬ 
ferior  to  m  is  a  sufficient  condition.  (It  is  also  shown  to  be  a 
necessary  condition).  The  orthogonality  req-dvement  is  identical  to 
the  requirement  given  by  (7),  and  therefore  defines  the  same  types 
of  polynomials,  normalized  to  make  the  leading  coefficient  equal  to  one 
(Equation  (33)),  The  m  zeros  of  7i(x)  are  r«al,  distinct,  and  are  in  the 
interval  (a,b). 


3? 


A  substantial  amount  of  manipulation  is  required  to  determine, 
explicitly,  the  weights  H^  defined  by  (38).  With  the  leading 
coefficient  of  0m(x)  (Equation  (12)  and  from  (13),  the  following 
two  equivalent  forms  are  determined: 


with  the  m^  Hermite  polynomial  (Equation  (23)),  and  where 

A  =  2m  (45) 

m 

and 

y  =  2mm!  (46) 

m 

from  (25)  with  a=l. 

The  coefficient  Hj,  can  be  determined  from  either  (42)  or  (43)  with 
0^  replaced  by  the  Hermite  polynomial  H^,  or  equivalently  using  a 
relation  of  the  Hermite  polynomial  and  its  derivative 
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H. 

a. 


IWX±)]2 


(47) 


In  summary  the  Gauss-Hermite  formula  is  of  the  form 


/■ 


e  f(x)dx  = 


+  E 


(48) 


,  th 


where  x^  is  the  i  zero  of  H^tx)  and  using  (40)  and  yr  as  defined  by 
(25) 


E  = 


2™ (2m)! 


-f  (2m)a) 


(49) 


with  £  some  point  in  (-00,") . 

An  extension  to  functions  of  more  than  one  variable  is  made  fol¬ 
lowing  the  scalar  method  and  using  the  extension  technique  previously 
described  for  the  polynomial  approximation,  to  obtain 


2  2 
00  -x,  -x 


ff 

—00  —00 


e  1e  2f (x,xn)dx1dx0  =>J 


12  12 


v1  ^AV\,X2k2} 


(50) 


with  the  weights  and  zeros  as  defined  for  one  dimensional  integrations. 


4.  Application  of  Polynomial  Expansions  to  a  Two-Dimensional 

Filtering  Problem 

In  this  section  we  will  develop  the  equations  of  the  previous  section 
for  a  particular  example,  which  we  intend  to  discuss  later  in  Chapter 
VII  in  great  detail.  The  appropriate  equations  describing  the  Bayes- 
Law  update  for  the  phase  demodulation  problem  (see  Chapter  VII)  are 
repeated  as 


Beginning  from  these  eauations  we  now  will  develop  a  Hermite 
recursion  formula.  We  assume,  a  set  of  quantities  are  available  at  a 
particular  time  which  completely  characterizes  the  conditional  density 
function,  and  demonstrate  how  these  same  quantities  are  computed  for  the 
following  time  increment. 

1.  From  the  prior  stage  we  have  the  means,  covariance  matrix, 
characteristic  values  and  vectors  of  the  covariance  matrix,  and  a 
series  expansion  of  the  conditional  density  in  characteristic  vector 
coordinates  in  terms  of  Hermite  polynomials. 


2.  The  series  form  of  the  density  is  put  into  Equation  (51)  and 
then  one  takes  the  Fourier  transform  of  both  sides. 

3.  Rework  the  result  of  Step  2  to  take  the  inverse  Fourier  transform 

to  obtain  J  . 

n+l|n 

4.  Multiply  Jn|n_^  by  the  observation  process,  Equation  (52),  to 

obtain  J  i  . 
n  j  n 

5.  Compute  moments  for  means,  covariance  matrix,  and  for  the 
coefficients  of  the  new  expansion. 

The  above  outline  is  illustrated  in  Figure  1.  In  the  figure  we 
see  two  types  of  operations,  analytic  and  computational.  The  analytic 
operations,  outlined  in  the  above  steps,  enable  one  to  compute  re¬ 

cursively  when  J  i  is  available  in  terms  of  the  coefficients,  b  . . 

J  n-i|n-]  ’  r£. 

The  computation  task  is  then  reduced  to  simply  evaluating  the  coeffi¬ 
cients  for  the  updated  density  function,  as  shown. 

The  normalizing  constant  is  omitted  from  the  following  sequence; 
the  result  is  then  correct  to  within  a  multiplicative  constant,  which 
is  evaluated  and  inserted  in  the  computer  program. 

The  modal  matrix  of  the  covariance  matrix  of  the  prior  cycle, 
designated  Q,  is  normalized  to  make  it  an  orthogonal  matrix 

Q’  =  <f 1  (53) 

We  designated  vectors  in  characteristic  vector  coordinates  as  v,  and  in 
the  position  and  velocity  coordinates  as  x.  Then 
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v  =  Tx 


(54) 


where 


til 

fc12 

C21 

1 2  2 

The  polynomial  expansion  of  the  two-dimensional  density  is  given 


by 


In  (55)  Ai,A2  are  the  characteristic  values  of  the  covariance  matrix, 
and  Vj,  v2  are  the  expected  values. 

The  corresponding  manipulations  required  to  describe  the  prediction 


density  Jn+1|n  begin  with 


J  ,  (  )  =  exp 


n  |  n  \  x2 


— ^-(t  x  -t  x  -v  ) - —  (t  x  +t  x  -v  ) 

2A i  \  1 1  i  12  2  1/  2*2  '  21  1  22  2  2/ 


m ,  m  „ 

jtt 

r=0  2=0 


b  H 
*2  r 


W2X> 


—  ft  X  +t  X  -V  )  H  =—  (t  X  +t  x  -v  \ 
A1  v  11  1  12  2  2J  [%'2>2  V  21  1  22  2  2  ' 


(56) 
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Next,  we  substitute  y  -x  A.  for  x  to  prepare  for  the  use  of  J  ,  in 

12  1  n[n 

(51).  After  some  rearranging  we  obtain 


I y  -x  A\  ml  m2 

V,|n(V  • 

2  /  r=0  2=0 


(57) 


where  we  have  defined  the  terms 


F(x  )  =  exp 
r  2 


SMyIyyV,)] 


'  nVtv  KyvJ 

i  i 


VV 


A 

1  r  /  m  „  v 

2 

=  exp 

- -  x  -  (  Tv-Ty 

2*^3  l  2  '  3  2  4 

sign  T3 

1 2\  t? 

1  2  3 


[v( 


T  v 
3  2 


T  y 
4  1 


(58) 


with  T  ,  T  ,  T  ,  and  T  given  by 
12  3  4 


T  =  - - -  =  sign  (T  )  Abs  (T  ) 

1  t  -t  A  1  1 

12  )  1 

T  &  _ Ld_. 

2  t12"tll^ 


•J*  A 


3  '  ‘  sl8"  <V  Abs  <T,> 


ana 


A  t 


21 


t22_C2lA 
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Now  it  happens  that  the  prediction  density  (51)  is  a  convolution  with 

respect  to  y  ,  so  that  we  may  combine  (57)  with  (51)  and  let  the  asterisk 
2 

denote  convolution  to  obtain 


Finally,  we  are  ready  to  take  the  Fourier  transform  of  both  sides  of 
(59),  giving 


2qA 


m  m 

EE 

!r=0  i=0 


b  e* 

rit  y 


F  (y 
r  , 


Vy2 


•>] 


(60) 


The  algebraic  details  of  evaluating  the  transforms  are 
Hecht  [9  ].  The  result  may  be  expressed  as  a  function  of 


given  by 
the  transform 


variable  u>  as 
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To  complete  the  cycle  it  is  necessary  to  approximate  the  density 

function  J  i  with  a  new  Hermite  function  as  characterized  by  a  new  set 
n|n 

of  coefficients,  b 

In  the  previous  Sections  it  was  shown  that  under  the  proper  conditions 
a  function  of  two  variables,  f(x],X2)  can  be  approximated  by  a  series 
expansion  of  Hermite  polynomials  of  the  form 

2  2  2  2  ®]  m2 

f  (X  x  )  «  y(x  X  )  =  e"aiXle"a2x2  LEb  ,H  (b.Xj)H  (x,x,)  (63) 

12  12  r=0i=0  ™  r  1 


with  the  coefficients,  b^,  determined  by 


b«-  r 


T2  ft  *<*1*2>Hr<vl)VV2)  'V’S 

r!2  tU  -Z  J, 


(64) 


The  approximation  is  the  best  in  the  sense  that 


00  00  2  n,  r  ^ 

If  ealXl  ea2X2  Jf(XjX2)  -  y(x1x2)J  dxjdx2  **  minimum  (65) 


—00  —00 


The  conditions  for  the  approximation  formulas  to  be  vaiid  are 
(Hildebrand  [ io  3  ); 

1.  e^H  e"«2x2>  o  in  (-,«) 


00  00 

.  ffx\ke«  l2*le-«1x 

—oo  —oo  1 


2dxjdx2  exist  for  all  nonnegative  integral 


values  of  k. 
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3.  The  integral  in  Equation  (64)  exists. 

For  the  choice  of  af  =  _1—  and  = 

2of 


■  "2  ,  with  Oj  and  a| 
2a2 


the  characteristic  values  of  the  covariance  matrix,  the  first  two 
conditions  are  obviously  always  true,  and  if  f(xjX2)  is  an  exponentially 
decaying  density  function,  condition  3  is  true.  In  the  developments 
of  this  section  we  associate  f(xjX2)  with  an  approximating  density 


function  in  the  form  of  (51)  with  dn|n  j  given  by  (62).  Since  the 
coefficient  function  of  Jn|n_,  in  (51)  is  bounded  for  all  values  of 
the  arguments  (and  from  reviewing  the  form  of  (62)),  it  is  clear  that 
condition  3  is  true  even  for  the  approximating  density  function  for  any 
value  of  m. 

Let  f(XjX2)  be  the  density  function  we  wish  to  approximate,  and 
let  m=0.  Then 


b 

oo 


f(XjX2)  dXjdx2 


£Li^l 

TT 


(66) 


y  (xjx2) 


al°2  -«lxi  "X2X2 
-  e  e 

71 


(67) 


Let  o j  ■  and 


be  the  variances  of  f(xjX2),  and  let 


a 


2 

1 


a 


2 

2 


2a 


2a 


2 

1 


1 

2 

2 


- .  fV-  - - - 


Equation  (67)  is 


-  *1  “  x2 

y(XjX2)  =  _  e  2o|  e  20  2 

2vyfo1a2 


We  see  that  by  taking  only  the  zero’th  term  of  the  expansion  we  can 
get  the  best  gaussian  fit  to  the  true  density  function,  and  also  a 
verification  on  the  form  of  the  coefficient  equation  (64). 

In  addition  to  the  above  assumptions  (except  let  m>0) ,  let  rij  and 

t*H 

n2  be  the  means  of  f(xjX2),  and  let  p  be  the  i-j  central  moment. 


-f  f  (x2-n2)J  f(XjX2)  dx^ 


The  product  of  the  Hermite  Polynomials  when  expanding  about  the 
expected  value  can  be  written 


with  .a.j  a  function  of  r,  i,  i  and  j. 
r£  lj 


Equation  (64)  is  then  rewritten  as 


^  1L  tcx*>i§  i  w-*., 


ulu2  ^  J>  ail 

- - 7 -  LI  r£  i,aia>i 

2  r !2  SLl*  i=0  j=0  j  j 


(71) 
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Comparing  (63)  with  (60),  we  see  the  approximating  polynomial 
expansion  was  accomplished  by  letting  * 


1 


a?  _  _L_ 

2  "  2a2  »  nl  =  V!  ’  ,)2  =  v2 


and  using  (71)  to  compute  the  coefficients,  br£.  The  coefficients, 

rla±y  (70)  and  (71)  are  functions  of  the  coefficients  of  the 

Hermite  Polynomials,  and  can  be  computed  one  time  in  advance  and  stored. 

Ihe  formula  for  generating  the  Hermite  coefficients  is  given  in  the 

previous  section.  Designating  c  as  the  coefficient  of  xn  in  H  (x), 

mn  m 

r£aij  in  Equation  (70)  is  determined  for  each  r  and  for  each  £  from 


5.  Applying  the  Hermite  Expansion 
The  general  formulas  of  the  previous  section  are  specialized  here 
for  the  case  when  the  system  equations  are  those  of  the  phase  demodulator 
(see  Chapter  VII).  This  section  describes  the  techniques  that  were  used 
to  mechanize  the  given  equations.  In  particular,  the  conditional  means 
V  the  characteristic  values  A^,  and  vectors,  t^ ,  of  the  covariance 
matrix,  and  the  coefficients  of  the  Hermite  approximation,  b  are 
described  in  detail  for  this  specific  example. 

To  perform  the  integrations  to  obtain  the  means  and  central  moments 
(to  compute  the  quantities  in  the  above  paragraph)  Equations  (51)  and 
(62)  are  combined  into  one  equation,  omitting  temporarily  the  normalizing 


constant. 
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and  Jn|n  1»  given  by  (62),  consists  of  two  parts,  an  exponential 
function  and  a  polynomial  function  of  the  arguments  y^  and  y2« 

The  exponent  of  the  exponential  part  was  a  quadratic  form;  i.e.,  of 
the  form  {-a  ||x-x  ||2  }.  To  perform  integrations  of  (73)  and  Equatf  n 
(73)  times  yiy^,  the  combined  exponential  function  is  put  into  the 
following  form  in  order  to  use  the  Gauss-Hermite  formulas 


exp 


1  f(yrmi)2 

2 


2P(y1-m1)(y2-m2)  (y  )2 

- +  — — - — 

°1«2(i-P2)  aj(i-p2) 


f(yiy2) 


(75) 


with  the  exponent  of  S(yjy2)  of  (73)  (as  given  by  (51)),  linearized, 
and  the  linear  terms  combined  with  the  linear  exponential  terms  of 
^n|n-l ^yly2^ •  T^e  error  between  the  linear  and  nonlinear  parts  of  the 
exponent  were  combined  with  the  polynomial  part  of  Jn|n  j  to  form 
F(yjy2).  This  technique  was  suggested  in  Bucy,  Geesey,  and  Senne  [5]. 
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The  terms  of  (75)  are 


m!  =  ECyj) 
m2  =  E(y2) 

°1  -  E[(y1-m1)2j 

°2  =  E  [(y2-m2 )] 


P 


correlation  coefficient 


E[(y1-mI)(y2-m?)] 


We  operate  on  the  exponential  part  of  (74)  as  follows. 


~  ^  [(zi"  C0S  yi)2+  ('z2~  sin  yi)2] 

A  ,  _  _  A 

-  2r  (2!-  cos  yi-  cos  yi  +  cos  yj)-  —  (z  -  sin  y, 

2r  i 


-  sin  y.  +  sin  y,)2 

1  (76) 

with 

"cos  yl  "  cos  y*  +  yj  sin  y*  -  y^in  y* 

sin  y}  =  sin  y*  -  y*  cos  y*  +  yjcos  y* 

yl  =  E{yi(h)  |  2<h-l),  ...,z(0)] 
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(77) 


The  first  line  on  the  right  of  the  equal  sign  is  the  linearized 
exponent  which  is  combined  with  the  exponential  terms  of  Equation  (62) 
The  second  and  third  lines  are  reworked  to  give 


z.A  z2A  z,A  /  \ 

+— 1—  cos  y,  + -  sin  y,  + - (y ,  — y*  1  sin  y  * 

r  1  r  l  r  \ll '  l 


Z2A 

r 


sin  y*  + 


(78) 


as  the  exponential  part  of  the  function  to  be  evaluated  for  the  inte¬ 
gration.  That  is  F(yjy2)  =  [exponent  of  (78)].  (polynomial  part  of 
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The  required  integrals  are  of  the  following  form 


00  oc 

// 


—00  —00 


(79) 


Thus  we  compute 


JmJyA=  exp 


|-  lT(!i~mi)2  2P(yrm1)(y?-m?)  +  (y^-m^l 

k  2  (-I- p 2 )  cr1o2^-p2)  2)-* 


F(yiy2) 


F(y1y2)  "  exp|-fe-  (yryi*)2+-^  cos  yi  +  *£ 


Z2A 

V-  sin  Yi 


+  ^  (yiyi*)sin  y  *-  ^cos  y*-  ^ 


yr-  -f sin  yl 


22A  * 


+  —^r  (yi  -  ?i)  cos  yj’ 


m  m 


r+i  /  ^  -p2^ 

L  L  b  (sign  T  )r  (sign  T  )£  (B)  2  I - LA 

n=0  £=0  3 


*4.1 

|-T- 


i?)  <-»' 

k=0Y/ 


(qA)‘ 


X,Tfq 


k/. 


|_A2T§q  +  XjA^fT3  +  Ajq- 


r-k 


^[V2-(vrv,)J  \+t  1  (v,-T,;| 


(80) 
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Using  the  Gauss-Hermite  formula 

|  m  m 

x2  -x2  G(x  x  )  dx  dx  £  w:  w  G  (x ,  x  )  (92) 

1  2>  12  2  l-i  3-i  h  23  *1  23 

the  mechanics  of  the  integrations  are  as  follows: 

1.  The  eigenvalues  and  eigenvectors  of  the  covariance  matrix  of  the 

linearized  equations, 

V 

Pal°2 

are  evaluated.  Figure  2  illustrates  the  various  axes  directions 
described  here. 

2.  The  formula  (92),  is  used  with  the  grid  points  scaled  propor¬ 
tional  to  VF*!  and  ^2\2  and  the  directions  along  the  correspond¬ 
ing  eigenvectors  are  referred  to  as  the  rotated  coordinates. 

That  is, 

*li  =  ypl  ti 

*2  =  yp2  h 

i  =  l,n 

n  -  number  of  points  for  integration  in  each  dimension  with  ^ 

the  abscissa  value  given  in  the  numerical  integration  tables. 
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3.  The  points  x^.  and  x^  .are  rotated  back  to  the  coordinate 
axis  using  the  transformation  matrix  formed  by  the  eigenvectors. 


The  points  are  then  shifted  to  be  centered  about  mt  and  m2. 

4.  The  function  (80)  is  evaluated  at  each  of  the  n2  points,  y^, 


5.  Moments  of  the  required  degree,  k  and  l,  were  found  from 


j  m  m  k  ^ 

l-u  ’ ^££*1 


where 


I 

oo 


m 

»  j-i 


6.  The  means  and  central  moments  (in  the  rotated  coordinates)  are 
deterri  .ned  according  to  the  formula  (94)  given  below. 

7.  Tl.  •  eigenvalues  and  eigenvectors  of  the  covariance  matrix  formed 
by  the  second  central  moments,  in  the  rotated  coordinates,  are 
evaluated. 

8.  The  central  moments,  p^,  required  for  the  Hermite  expansion  in 
the  directions  of  the  eigenvectors  determined  in  Step  7,  using  the 
results  of  Step  6,  are  evaluated,  using  the  formula  (94)  given  below. 

9.  The  equations  of  Section  I  are  used  to  determine  a  new  set  of 
coefficients,  b^,  to  characterize  the  new  density  function. 

To  perform  the  computations  of  the  above  steps,  two  relations  are 
needed:  1)  a  formula  to  convert  moments  to  central  moments,  and 
2)  a  formula  to  compute  central  moments  about  a  rotated  set  of  coordi¬ 
nates.  The  first  formula  is  given  as  follows: 
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Upon  computing  the  coefficients,  b^,  for  the  Hermite  expansion, 
the  following  observations  are  made: 

1.  The  coefficient  b  is,  from  (66), 

b  =  —  (2Aj)  ^2(2a 

oo  it  aA27 

=  normalizing  constant  for  a  gaussian  density  with 
variances  A^Aj  . 

2.  b1A  =  b  .  =0,  from  (71). 

10  cl 

3.  b^  =  0,  from  (71).  When  expanding  in  eigenvector  coordinates 

“n  •  °- 

4.  b2Q  =  bQ2  =>  0,  from  (71).  The  Hermite  polynomials  are 

(Section  2) 

-  1 

H  j(x)  =  2x 
H  2(x)  =  4x2-2 

To  compute  b2Q,  for  example,  using  (71), 

b20  ’  b00^r[20ai<,+25aio:2»)''/2',lp+2oa20(2Xl)'1''2o] 


where  from  (72) 
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2o  00 
2oaio 
2oa2o 


-2 

0 

4 


2o 


1 


oo  222! 


2  +  2X 


-1 
1  l 


20 


]"• 


since  p20=  \u 


5.  Noting  bQo  is  only  a  normalizing  constant,  the  lowest  coefficients, 
b^,  which  contains  information  beyond  the  gaussian  fit  are  b3o,  b21, 

b j2 i  and  bo3.  The  reason  for  this  is  intuitively  clear. 

a)  b^  and  bgi  account  for  the  expected  value  of  the 
approximating  gaussian  function  being  different  than  the  actual 
function. 

b)  bj,  accounts  for  the  lowest  order  cross  term  of  the 
approximating  gaussian  function  being  different  than  zero. 

c)  b2Q  and  bQ2  account  for  the  variances  of  the  approxi¬ 
mating  gaussian  function  being  different  than  the  actual  function. 

6.  The  Hermite  expansion,  as  given  in  Section  2,  is  such  that 
addition  of  higher  terms  of  a  truncated  series  does  not  affect 
lower  order  coefficients  (Wiener  [14]).  For  example,  in  (63), 
if  m  were  changed  to  m+1,  b  ^(r<m,  2<m)  would  not  change. 

When  developing  the  computer  program,  experiments  were  made  to 

determine  the  numer  of  terms  to  carry  in  the  series  expansion,  i.e., 
the  value  of  m  in  (63).  The  value  was  related  to  the  extent  of  the 
nonlinearities.  For  the  phase-lock  problem  (Chapter  VII)  the  non- 
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linearity  was  approximately  proportional  to  the  parameter  p^  (0)  = 

V2  r3/,l*q1/,lt.  For  Pi  1  (0) 1  ^2<  0.1  radian  the  linear  and  nonlinear 

l/2 

programs  showed  insig  .ificant  differences.  For  .1  <  Pij(0)  '  <  .55 
radian,  tests  were  ma-  i  for  m  ranging  to  9.  There  was  little  difference 
in  the  sequence  of  es*.  Lmates  between  m=5  and  m=9.  Monte  Carlo  tests  were 
therefore  made  with  m=5. 

It  was  further  noted  in  the  experiments  that  higher  order  cross  terms 
could  be  neglected  for  equivalent  accuracy.  In  (63),  instead  of 
r  =  (0,5) 

4  =  (0,5) 

it  was  sufficient  to  let  r,  Z  =  nonnegative  integers  such  that  r+S,=50. 

This  was  reasonable  in  view  of  (64),  where  it  may  be  noted  b  is 

Tl 

normalized  by  a  factor  — — ^ - .  The  coefficients  b50  or  bnc-,  there- 

2r!2 l! 

fore,  would  carry  3840  times  more  weight  than  the  coefficient  b55,  if  it 
were  used. 

One  additional  alteration  was  male  as  a  result  of  the  experiments, 
to  enable  the  program  to  perform  satisfactorily.  Occasionally,  it  was 
observed,  the  density  function  would  take  on  small  negative  values.  To 
normalize  the  density  function  a  multiplicative  factor  was  used  to  make 
the  integral  of  the  function  equal  tc  one.  This  factor,  of  course, 
multiplied  the  negative,  values  as  well  as  the  positive  values,  which 
created  ot:V.ems.  To  avoid  the  problem,  the  density  function  was  set 
equal  to  zero  whenever  a  negative  value  was  computed.  This  meant  that 
the  Hermj.te  approximation  was  no  longer  the  best  fit  in  the  sense  of  the 

previous  Sections.  However,  the  negative  values  that  were  neglected 

-4 

were  always  small,  never  being  greater  than  about  10  times  the  largest 
positive  value  of  the  density  function. 
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C.  The  Point-Mass  Approximation 

The  method  of  representing  density  functions  by  point  masses  on  a 
floating  grid  was  originated  by  Bucy  [2],  and  refined  by  Bucy  and 
Senne  [6].  We  abstract  a  portion  of  the  latter  reference  here  for  an 
introduction  of  the  general  method.  To  begin  we  let  J  (y)  be  the  one- 
step  predictor  density  and  consider  the  (2M+l)d  points  represented  by 
the  expression 


2M+1 

=  X)  Jn^n(±l*,,*’id)]  <96> 

d  1 »  ^2 »  *  *  * »  ^ 


where  6  is  the  Dirac  delta  function  of  d  arguments. 
Define  a  vector  J  e 


as 


J  A 
-n  = 


(1,1, 


Jn  [gn(2M+1 . 2M+1)]  j  (97) 

» 


and  the  grid  map 


in 


K^Rd 


,  2M+1 


With  this  approximation  the  state  becomes  a  pair  (J  ^ )), 
the  first  a  (2M+l)d  vector,  and  the  second  a  function  specified  by  a 
(2M+l)d  vector  of  its  ordered  images  in  Rd.  In  other  words  the 
effective  state  dimension  becomes  (d+1) (2M+l)d,  since  the  map 
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is  determined  by  a  dx(2tH-l)d  table  or  matrix.  Now  substituting 
(96)  in  Bayes  law  we  arrive  at  the  state  update  equation. 


2M4-1 


CA+  !(1 1 . id)  Tn[Vi(ii  ’  *  ’  *  ’V  ,Sn(ji  ’  *  *  *  ty]4<Ji  ’  * ' 1  ’■ jd) 

.1 1 1  •  •  •  > 


where 


Tn<8n+1'  V  VVV>  r‘(n) 

-  l/2||bn(«n)|l8  ,(»)]«  [V,  -  ,An+i  ^£n+l')] 


with 


(98) 


N(a,  B)A 


(det  B) 


-1/2 


(27!) 


d/2 


exp 


"  I/2M2IIL 


B-l 


and  C  is  chosen  so  that  the  total  mess  of  J  ,  is  one. 
n  -n+1 

Now  (98)  can  be  sequentially  computed  by  a  digital  crmputer,  once 

2d 

the  gridding  is  determined.  However,  we  must  evaluate  (2M+1) 

matrix  elements  arc  multiply  a  (2M+l)d  vector  with  the  matrix,  tn 

order  to  select  the  gridding  to  make  our  approximation  most  effective 

we  will  center  the  grid  for  I  (y)  at  our  best  estimate  of  x  given  z 

n  n  0  n-2’ 

...,zo  [L.e  x(njr:  -2)]  since  the  gridding  for  must  be  given  before 
is  computed  and  hence  only  ^  is  known.  Similarly,  the  mesh  size 
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Let  . e  be  the  eigenvalues  and  eigenvectors  of 

n  n  n  — n 

£(n|n-2).  Then  define  ^  by 


+x (n |n-2) , 


(101) 


and  nQ  is  a  constant  parameter. 

We  call  the  grid  a  floating  grid.  The  floating  grid  is  centered 
at  the  best  available  estimate  of  the  current  mean  and  rotated  from  the 
state  coordinate  frame  into  the  principle  axes  of  the  best  available 
estimate  of  the  error  ellipsoid. 

In  Bucy  and  Senne  [  6 ]  there  is  a  general  discussion  of  the 
computational  problems  associated  with  the  above  scheme,  and  some 
general  (problem  independent)  method  for  reducing  the  computational 
burden.  It  turns  cut,  however,  that  the  most  impressive  simplifications 
generally  take  place  as  a  result  of  peculiarities  of  the  application 
at  hand.  For  example,  the  croblem  of  Chapter  VII  in  this  report 
illustrates  how  one  takes  advantage  of  a  singular  A  matrix  (i.e.  fewer 
driving  noises  than  states).  In  Appendix  C  of  Chapter  VII  we  will 
persue  that  example  closely  to  determine  the  implications  of  the 
simplification  on  me  point-mass  calculations. 


D.  Non-Orthogonal  Series  -  Gaussian  Sums 

In  his  dissertation  Lo  [12]  observed  that  nonlinear  filtering  for 

linear  systems  with  non-gaussian  a  priori  densities  can  be  achieved  to 

any  accuracy  desired  by  approximating  the  a  priori  density  with  a 

suitable  sum  of  weighted  gaussians,  and  initializing  a  bank  of  linear 

Kalman-Bucy  filters  from  the  terms  of  the  gaussian  sum.  Lo  further 

showed  that  certain  nonlinear  systems  with  measurement  functions 
* 

possessing  f  inite-diijiensi^nal  sensor  orbits  (see  Bucy  and  Joseph  [4]) 

can  be  transformed  by  change  of  coordinates  into  linear  systems  with 

non-gaussian  a  priori  densities.  Thus  optimal  nonlinear  filtering  (at 

least  in  the  sensor-orbit  coordinates)  is  achievable  for  such  problems. 

Unfortunately,  the  relationship  between  the  optimal  sensor-orbit  estimate 

and  the  optimal  estimate  in  the  original  coordinates  is  not  simple,  so 

that  if  one  needs  optimal  nonlinear  estimates  in  the  original  coordinates 

\ 

it  is  frequently  advisable  to  retain  the  original  coordinates  and  use 
Bayes  Law,  as  discussed  in  the  previous  sections.  The  question  arises, 
though,  whether  there  is  a  suitable  generalization  of  the  bank  o"  linear 
filters  used  by  Lo  which  will  apply  in  the  original  coordinates  to 
nonlinear  problems.  Such  an  expansion  would  be  equivalent  to  representing 
the  a  posteriori  density  by  a  nonorthogonal  series  of  gaussians. 

One  advantage  of  a  gaussian  sum  is  that  it  must  be  everywhere 
positive,  so  that  the  truncated  series  will  not  have  the  unfortunate 
characteristic  of  oscillating  negative/positive  instabilities.  On  the 
other  hand,  since  the  series  is  nonorthogonal  (on  1^)  it  happens  that 
the  determination  of  the  minimum-norm  (in  L2)  gaussian  sum  approximation 
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requires  inverting  a  large  matrix  of  inner  products.  Center  [6]  has 
described  the  generalized  least-squares  process  and  has  illustrated  how 
the  selection  of  coefficients  for  the  representation  amounts  to  solving 
an  unconstrained  minimum  norm  problem  in  L2  (i.e.,  an  orthogonal 
projection).  Thus,  the  optimal  choice  of  coefficients  for  a  non- 
orthogonal  basis  involves  a  substantial  amount  of  computation. 

Recognizing  this  computational  burden,  Alspach  [1]  proposed  a  sub- 
optimal  scheme  for  dropping  gaussians  until  a  suitable  fit  (called  a 
"theorem  fit")  is  obtained.  He  discusses  several  special  cases,  such 
as  the  tracking  problem  in  Chapter  VI,  where  simplifications  arise  from 
symmetry  of  the  density  functions.  Once  the  appropriate  fit  is  obtained, 
however,  there  remains  the  problem  of  implementing  Bayes  Law.  For  this 
Alspach  proposes  local  linearizations.  He  expands  both  the  driving 
noise  density  and  the  measurement  noise  density  in  suitable  gaussian 
sums,  then  essentially  considers  the  pairs  of  points  which  occur  as  a 
result  of  Bayes  Law,  linearizes  about  these  points  and  uses  partial 
realizations  of  linearized  Kclman-Bucy  filters  for  each  of  the  terms. 

The  result  is  an  increase  in  the  number  of  gaussians  during  the  Bayes 
Law  computation,  so  that  he  must  then  reduce  the  resulting  number  of 
terms  by  discarding  those  with  insignificant  weight. 

The  above  procedure  may  be  summarized  as  shown  in  Table  1  (adapted 
from  Alspach  [1],  pp.  101-110). 
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Table  1.  Outline  of  a  Gaussian-Sum 
Recursion  Procedure  [1] 


A.  Form  P  (SjJifc-i . as  a  gaussian  sum  approximation  in 

the  form 


Pxk(4lVl»-*“20)  =  2  ak.N(4~4  .»*£>  »  (102) 

where  P'  is  constrained  to  have  the  property  that  for  a  preassigned 


s  >  1, 

s 

so  that 

e  -  s2  p; 

k. 

l 

B. 

Linearize 

forming 

4 

L. 

l 

Then  Form  P  (£,  1  z, 

i 

4  <  E  for  all  i  e  [l,n£]  , 


k. 

l 


a  gaussian  sura  as 


V4K . V  ■  £  “k  • 


J  i 


(103) 


j  “  •  •  •  >  I). 


°k  *  4  ’ 


4.  =4.  +Kk.  [4  “4  (4,)j  ’ 

j  j  j  j 

Pk.=4-Kk.  V4>pk,  * 

j  j  j  j  j 


where 
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and  a,  =  — 
k. 

1 


K.  "<4  -  4(4  >•  W  4.  V4  >  +  V 


a _ 3  3 


Numerator 

j=l 


C.  Drop  insignificant  terms  to. reduce  the- combined  number  of 
gaussians. 


D.  If 


[wwk  >  Pk.  +  Qk] 

L  j  j  j  J 


Is2  ■  s2  K-n,  <  E  • 

i 


(104) 


f°™  p^i^1^ . V  as 


Px,  ,,  ^k+JV  ‘•*,z0)  =  Xrf  “k+l.  N^+l  “  %+!/  Pk+1 , ^  (105) 


k+1 


i=l 


where : 


nk+l  nk  ’ 


01k+li  =  C(ki 


%KL.  ~  ^k+l(iik  * 
l  i 


Pk+1.  =  Vfl^k^  Pki  ^k+l^k^  +  Qk 


and  go  to  step  F.  If  (104)  does  not  held,  go  to  step  E. 


E.  Given  a  prespecified  gaussian  sum  approximation  to  P^)  of 
the  form: 


Pu.  ^  “  2  >k  N(4  “  \  \  > 

k  n=l  n  n  n 


(106) 
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which  leads  to 


k 

E  \  N[^+i  -  w\>  -  wv 

n=l  ^  J 


<4 '  4.>  -  4  •  \  ] (107) 

3  n  n-J 


where  is  linearized  about  jj^  .  With  this  approximation,  if.  each 

k+1  stage  covariance  is  such  that 


S  Pk+1  s2  Kc+l^k.*  Pk.  $k+l(lJk.)a  +  Qk  ]  < 

L  3  3  3  nJ 


E  , 


(108) 


form  pxk+1<4+iUk’-*“-so> 


'k+l 

Pxk+l^k+1^“k . =  2  0lk+li  N(%-1  "  4.+li’  4+1^  (109) 


where 


nk+l  nk  qk  * 


“k+l.  ~  °k.  Yk  * 
i  3  n 


-Vi  "4+i^k.7 • 

i  3  n 

Pk+1  ~  \+l*"~k ?  Pk.  \+l(-4.  *  +  Qk 
i  J  3  j  n 


and  where  the  index  of  summation  comes  from  combining  the  terms  of 
p(^|*k>  anc*  such  that  i  =  j  +  (n  -  1)  nk*  Next  go  to 

step  F.  If  (108)  is  not  valid  go  to  step  G. 

F.  Update  the  time  index  k  to  k+l  and  return  to  step  B. 


G.  Form  p  (g.  -  |_z,  >  • .  •  as  *n  stcP  (105)  and  then  update 
*k+l  1 

the  time  index  k  to  k+1  and  return  to  step  A. 

The  basic  procedure  of  Table  1  has  been  used  in  an  example  in 
Chapter  VI,  where  certain^ simplifications  have  resulted  from  symmetry ^ 
(see  Chapter  VI) .  For  more  details  concerning  other  special  cases  and 
certain  accuracy  discussions  the  reader  is  encouraged  to  consult 
Alspach  [1] . 
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E.  Other  Computational  Methods 
1.  Fourier  Series  Expansions 


Whenever  the  conditional  densities  have  compact  support  and  are 

periodic  (for  example,  in  the  cyclic  version  of  the  problem  in  Chapter 

. .  ~  '  *  *•  . *  *  '  * 

VII)  it  is  frequently  possible  to  expand  both  sides  of  the  Bayes  Law 
equation  in  a  Fourier  Series.  Although  such  an  expansion  would  always 
be  theoretically  possible,  its  advantage  would  not  be  particularly  good 
unless  it  is  possible  to  obtain  (analytically)  a  closed  form  for  the 
updating  of  the  coefficients  through  Bayes  Law.  It  is  also  important 
that  proper  consideration  be  given  to  the  problem  of  truncation  of  the 
series,  since  (as  generally  will  be  the  case)  the  coefficients  for  the 
new  expansion  will  be  given  as  an  infinite  sum  of  the  previous  coefficients. 
For  example,  we  derive  in  Chapter  VII  (Appendix  D)  an  expression  of  the 
form 


'  V  c  n-1 

m£  ra-Q  ata-Z 
n _ a _ 

mZ  Y)  S  an_1 
>  -a  aa 


(110) 


where  an„  are  the  (two-dimensional)  Fourier  coefficients  for  expanding 
mx. 

the  conditional  probability  density  at  time  n,  and  the  m.  and  S 

X/  Q 

are  coefficients  resulting  from  the  derivation  of  the  recursion  formula. 
The  recursion  involves  two  infinite  suras.  Of  course  if  the  density  is 
uniform  then  only  one  coefficient  will  be  non-tero.  Thus,  in  contrast 
to  the  Gauss-Hermite  expansion,  the  easiest  situation  to  represent  by  a 
Fourier  Series  is  the  noise-saturated  case. 
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For  low-noise  applications,  however,  when  the  density  contains  a 
maximum  amount  of  structure  (i.e.,  information)  then  the  Fourier  Series 
will  begin  to  contain  many  high  frequency  components  so  that  the  recur¬ 
sion  formula  may  lead  to  high  frequency  instabilities.  We  are  currently 
studying  metnods  for  reducing  the  impact  of  the  problem  on  the  one  hand, 

and  for  quantifying  its  effects  on  the  other  hand.  We  will  report  these 

.  .  ....  •  ....  * 

results  at  a  later  date. 


2.  Spline  Functions 

It  turns  out  that  one  more  generalization  of  the  least-squares  (L2) 
approach  of  Center  [6]  leads  to  the  method  of  interpolating  splines,  as 
revealed  by  Weinert  and  Kailath  [12],  The  additional  generalization  is 
the  inclusion  of  constraints  in  the  minimization  process.  The  constraints 
take  the  form  of  knots  where  the  values  of  the  function  are  assumed  known. 
Typical Ly  the  knots  are  chosen  so  that  the  spline  residuals  are  minimized, 
but  the  optimal  choice  of  locations  can  often  be  a  difficult  proposition 

In  a  sense  the  spline  interpolation  is  a  combination  of  the  point- 
mass  method  with  the  least-squares  fitting  of  polynomials.  Thus  it  may 
be  expected  to  include  the  advantages  of  both  methods.  Preliminary 
experiments  by  deFigueiredo  and  Jan  [7],  [10]  appear  to  confirm  this 
conjecture,  although  more  definitive  experiments  are  called  for. 
DeFigueiredo  and  Jan  have  used  multivariate  fundamental  splines 
("B-splines")  for  the  Bayes  law  calculation  since  "they  are  nonnegative; 
their  integral  over  the  reals  is  unity;  and  they  give  rise  to  a  set  of 
basis  functions  (called  ’fundamental  functions')  with  minimum  support, 
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in  terms  of  which  a  continuous  nonnegative  function  is  approximated  by 
interpolation,  as  a  nonnegative  function"  [8]. 

Using  cubic  polynomials  as  basis  functions,  deFigueiredo  and  Jan 
have  considered  a  recursion  which  begins  at  stage  k  with  a  given  mesh 
and  the  basis  functions.  Then  they  compute  the  projections  of  the  given 
measurement  and  state-increment  probability  densities  onto  the  space 
spanned  by  the  basis  functions.  Then  by  substitution  of  the  resulting 
inner  products  into  the  Bayes  recursion  formula  an  expression  for  the 
aposteriori  density  function  in  terms  of  an  infinite  sum  results.  The 
next  step  depends  on  the  application.  If  the  old  mesh  is  adequate  for 
the  new  density,  then  an  expansion  is  made  and  the  process  is  repeated. 

If  the  old  mesh  no  longer  suffices  then  a  new  mesh  is  chosen,  either 
deterministically  (analogous  to  the  predictive  gridding  of  Bucy  and 
Senne  [6])  or  iteratively  (i.e.,  using  a  surface  searching  technique). 

In  any  event  the  spline  technique  involves  both  the  computational 
difficulties  and  the  advantages  of  'he  point-mass  and  leas  -squares 
series  representations.  The  method  promises,  however,  to  provide 
superior  performance  to  both  methods  given  the  same  number  of  computations. 
An  extensive  computational  study  will  be  required  to  determine  if  this 
conjecture  is  true.  We  regret  not  having  been  able  to  undertake  such  an 


experiment . 


77 


References 


[  1]  D.L.  Alspach,  "A  Bayesian  Approximation  Technique  for  Estimation 
and  Control  of  Time  Discrete  Stochastic  Systems,"  Ph.D.  Disserta¬ 
tion,  University  of  California,  San  Diego,  1970. 

[  2]  R.S.  Bucy,  "Bayes  Theorem  nd  Digital  Realizations  for  Non-Linear 
Filters,"  J.  Astro.  Sci.  ±7  (1969),  80-94. 

[  3]  R.S.  Bucy,  "Building  and  Evaluating  Non-Linear  Filters,"  To  appear, 
Proc.  Symp.  on  Appl.  Math.;  Stochastic  Diff.  Eqns.,  Amer.  Math.  Soc., 
April  1972. 

[  4]  R.S.  Bucy  and  P.D.  Joseph,  Filtering  for  Stochastic  Processes  with 
Applications  to  Guidance,  Wiley  Interscience,  New  York,  1968. 

[  5]  R.S.  Bucy,  R.A.  Geesey,  and  K.D.  Senne,  "A  Passive  Receiver  Design 
via  Nonlinear  Filtering  Theory,"  Proc.  Third  Hawaii  International 
Conf.  On  System  Sciences,  Vol.  I,  1970,  477-480. 

[  6]  R.S.  Bucy  and  K.D.  Senne,  "Digital  Synthesis  of  Nonlinear  Filters," 
Automaiica,  7  (1971),  287-298. 

[  7]  J.L.  Center,  "Practical  Nonlinear  Filtering  of  Discrete  Observa¬ 
tions  by  Generalized  Least  Squares  Approximation  of  the  Conditional 
Probability  Distribution,"  Proc.  Second  Symp.  on  Nonlinear  Estima¬ 
tion  Theory  and  Its  Applications,  San  Diego,  Sept.  1971,  88-99. 

[  8]  R.J.P.  de  Figueiredo  and  Y.G.  Jan,  "Spline  Filters,"  Proc.  Second 

Symp,  on  Nonlinear  Estimation  Theory  and  Its  Applications,  San  Diego, 
Sept.  1971,  127-138. 

[  9]  C.  Hecht,  "Synthesis  and  Realization  of  Nonlinear  Filters,"  Ph.D. 
Dissertation,  University  of  California,  1972. 

[10]  F.B.  Hildebrand,  Introduction  to  Numerical  v.alysis,  McGraw  Hill. 

New  York,  1956. 

[11]  Y.G.  Jan,  Ph.D.  Dissertation,  Rice  University,  1971. 

[12]  J.T.  Lo,  "Finite  Dimensional  Sensor  Orbits  and  Nonlinear  Filtering," 
Ph.D.  Dissertation,  University  of  Southern  California,  1969. 

[13]  H.L.  Weinert  and  T.  Kailath,  "Recursive  Spline  Interpolation  and 
Least-Squares  Estimation,"  submitted  to  .Amer.  Math.  Soc..  1971. 

[14]  N.  Wiener,  The  Fourier  Integral  and  Certain  of  Its  Applications, 
Cambridge  University  Press,  '.ambridge,  1933  (Also:  Dover,  New  York, 
1958)  . 


79 


IV.  Monte  Carlo  Analysis  Techniques 

A.  Introduction 

It  is  unfortunate  that  in  most  nonlinear  filtering  problems  of 
interest  the  analytic  performance  evaluation  of  the  estimates  is  as 
untractable  as  the  derivation  of  the  optimal  estimates  themselves. 
Naturally,  in  order  to  compare  alternative  estimates  an  error  evaluation 
procedure  of  high  quality  is  essential.  Thus  it  is  necessary  to  turn 
to  Monte  Carlo  simulation.  This  section  is  devoted  to  describing  the 
necessary  experiments  and  to  clearing  up  some  of  the  misconceptions 
and  misuses  of  the  Monte  Carlo  method. 

The  term  "Monte  Carlo  Simulation"  is  subjected  to  a  great  number 
of  alternative  and  sometimes  misleading  interpretations.  Some  authors 
claim  that  significant  conclusions  can  be  made  on  the  basis  of  averaging 
only  a  few  random  numbers.  Others  claim  only  that  Monte  Carlo  simula¬ 
tion  has  proved  that  one  technique  is  superior  to  another  without  any 
explanation  of  the  type  of  experiment  or  the  confidence  associated  with 
the  conclusions.  Other  investigators,  aware  of  the  cost  of  such 
simulations,  regret  to  say  that,  due  to  the  prohibit  expense,  they 
were  limited  to  2-10  Monte  Carlos,  but  then  they  come  to  conclusions 
completely  inconsistent  with  that  limitation.  The  result  is  that  the 
Monte  Carlo  simulation  has  become  the  most  prevalent  form  of  cheating 
with  statistics  in  the  engineering  literature.  Consequently,  we  intend 
to  reverse  that  trend  by  describing  precisely  what  our  simulations 
entailed  and  analyzing  tht  confidence  associated  v;ith  our  conclusions. 

Preceding  page  blank 


In  addition,  in  an  appendix,  we  present  our  random  number  generator, 
which  is  realizable  on  any  binary  computer,  so  that  any  reader  can 
convince  himself  of  our  conclusions  by  precisely  duplicating  them  if 
he  chooses. 

B.  Statistical  Analysis 

By  Monte  Carlo  simulation  we  mean  that,  in  order  to  estimate  the 

A 

moments  of  a  random  process  generated  by  subtracting  an  estimate  x^ 
of  a  variable  xr  from  the  true  value,  a  large  number  of  statistically 
independent  realizations  of  the  difference  are  generated  and  the 
statistic 


N 


i=l 


A  i  i. 

is  formed.  If  y  is  the  mth  moment  of  the  difference  x  -  x  for  all 
m  n  n 

i,  then  y  will  have  mean  y  as  expected  but  what  confidence  is 
m  m 

associated  with  the  estimate,  or,  equivalently,  how  large  should  N  be? 
The  statistic  (1)  is  asymptotically  normal  with  mean  and  variance  given 
by  [2] 


E[u  ]  =  y  , 
m  m 

(2) 

and 

2 

c "  ■>  ^2m  ^m 

var  '"m1  *  s 

(3) 

Since  the  statistic  is  Gaussian  we  may  compute  the  probability 
confidence  bands  for  the  estimates  for  large  N.  As  an  example,  consider 
the  probability  (0.9974)  that  a  gaussian  deviate  lies  within  three 
standard  deviations  of  its  mean.  Thus,  for  large  N,  we  claim  with 
probability  0.9974  that 
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An  equivalent  statement  to  (4)  is 


where  P  =  — =—  ,  provided  p  i  0. 
ni  in 


Now  the  bounds  in  (5)  are  asymptotic  results  but  we  may  still  use 

them  effectively  if  we  know  we  have  taken  N  large  enough.  One  technique 

is  to  compute  the  sample  distribution  of  p  (by  performing  M  Monte  Carlos 

m 

of  length  N)  and  test  its  normality  with  (say)  a  Chi-squared  or 
Kolmogoruv-Smimov  test.  More  reasonably,  an  upper  bound  on  the  value 
of  P^  may  be  computed  for  use  in  (5)  based  on  the  samples.  But  it  is 
definitely  true  that  any  confidence  band  of  fewer  than  three  standard 
deviations  would  be  at  best  subject  to  misinterpretation  since  the 
associated  probability  for  plus  or  minus  one  standard  deviation  is 
only  0.6836  and  only  0.9544  for  two.  It  is  unfortunate,  however,  that 
the  convergence  of  the  bound  (4)  is  extremely  slow  with  N.  If  we 
compute  the  total  width  W(Pm,N)  of  the  confidence  band  for  n^  standard 


deviations,  we  get 


W(P  ,N)  /?  -  1 

Tn  2V~iT“  * 

mo  1 


(6) 


Examples  of  the  normalized  confidence  widths  are  given  in  Table  1  for 

various  values  of  P  and  N.  It  can  be  seen  that  for  the  three-standard 
m 

deviation  confidence  width  to  be  less  than  0.2  p  ,  and  N  of  2000  is 

m 

needed  if  P  =  3,  N  =  5000  is  needed  for  P  =5,  and  N  =  10000  is  needed 
m  m 


% 


for  Pm  =  10.  Thus,  in  order  to  separate  two  Monte  Carlo  results  with 

high  confidence  (probability  0.9974)  if  the  numbers  lie  within  10%  of 

each  other,  an  extremely  large  number  of  runs  is  required.  Also,  the 

table  shows  that  absolutely  nothing  can  be  said  with  high  confidence 

if  fewer  than  20  Monte  Carlos  are  performed  (with  P  ■=  3)  or  50  Monte 

m 

Carlos  (with  P  -  6)  . 


P-l\  /z 


Table  1.  The  normalized  standard  deviation 
as  a  function  of  P  and  N 


N  P 

=  1 

2 

3 

4 

1 

0 

2.00 

2.83 

3.46 

2 

0 

1.41 

2.00 

2.45 

5 

0 

.0894 

1.26 

1.55 

10 

0 

.632 

.894 

1.10 

20 

0 

.447 

.632 

.775 

50 

0 

.283 

.400 

.490 

102 

0 

.200 

.283 

.346 

2x10  2 

0 

.141 

.200 

.245 

5x10  2 

0 

.0894 

.126 

.155 

10  3 

0 

.0632 

.0894 

.110 

2xl03 

0 

.0447 

.0632 

.0775 

5x10  3 

0 

.0283 

.0400 

.0490 

104 

0 

.0200 

.0283 

.0346 

2x10  4 

0 

.0141 

.0200 

.0245 

5xl04 

0 

.00894 

.0126 

.0155 

10  5 

0 

.00632 

.00894 

.0110 

2x10  s 

0 

.00447 

.00632 

.00775 

5x10  s 

0 

.00283 

.00400 

.00490 

10s 

0 

.00200 

.00283 

.00346 

2  x10  s 

0 

.00141 

.00200 

.00245 

5x10s 

0 

.000894 

.00126 

.00155 

10  7 

0 

.000632 

.000894 

.00100 

5 

6 

8 

10 

4.00 

4.47 

5.29 

6.00 

2.83 

3.16 

3.74 

4.24 

2.24 

2.00 

2.37 

2.68 

1.26 

1.41 

1.67 

1.90 

.894 

1.00 

1.18 

1.34 

.566 

.632 

.748 

.849 

.400 

.447 

.529 

.600 

.283 

.316 

.374 

.424 

.224 

.200 

.237 

.268 

.126 

.141 

.167 

.190 

.0894 

.100 

.118 

.134 

.0566 

.0632 

.0748 

.0849 

.0400 

.0447 

.0529 

.0600 

.0283 

.0316 

.0374 

.0424 

.0224 

.0200 

.0237 

.0268 

.0126 

.0141 

.0167 

.0190 

.00894 

.0100 

.0118 

.0134 

.00566 

.00632 

.00748 

.00849 

.00400 

.00447 

.00529 

.00600 

.00283 

.00316 

.00374 

.00424 

.00224 

.00200 

.00237 

.00268 

.00126 

.00100 

.00118 

.00134 

83 


In  case  the  hypothesized  actual  moment  u  is  zero  (for  example: 

m 

the  odd  moments  of  an  even-valued  density  function) ,  then  the  two-sided 
bound  of  (5)  must  be  replaced  (by  setting  =  C  :.n  (4)  )  with 


IV  inoVlT 


(7) 


Equation  (7)  will  be  used  to  test  the  accura:^  of  the  odd  moments  of  a 
gaussian  random  number  generator  below. 

A 

In  addition  to  testing  the  accuracy  of  de  overall  statistic  y^CN), 
we  are  also  interested  in  the  allowable  bf,t;’/iot  of  the  cumulative 

A 

average,  whereby  y  (N+l)  is  obtained  from  .  (N)  by 
m  m 


<r  r> 

yra(N+!)  -  N+i  bm(N)  +  vt-i 


m 


(8) 


Clearly  the  sample  paths  of  the  cumula^i/c  average  will  steadily  approach 
its  mean  y  with  fluctuations  gradually  aiminishing  asymptotically  to 
zero  as  predicted  by  the  bounds  (5)  3'd  (7).  Thus  if  the  sample  path 
has  been  stored  we  may  detect  statistically  questionable  samples  by 
placing  a  pair  of  confidence  bands  of  (say)  tvo  standard  deviations 

A 

around  the  entire  trajectory  of  y  (n),  nHl,***,N  using  the  assumed 
value  of  y^  equal  to  the  asymptotic  sample  moment  Frequently 

the  trajectory  of  yi^N)  will  reveal  programming  errors  whereby  an 
occasional  error  is  extremely  large  (for  example  -  by  dividing  an 
estimate  by  zero  unexpectedly).  It  is  important  to  determine  not  only 

A 

the  asymptotic  moment  ^(N) ,  then,  but  also  to  estimate  whether  any  of 

A 

the  individual  samples  deviates  unusually  far  from  Mm(N) .  A  typical 
plot  of  a  questionable  sample  path  is  given  in  Fig.  1.  The  behavior 
of  the  figure  suggests  a  way  in  which  bad  samples  (i.e»,  not  taken  from 


2a  CONFIDENCE  BANDS 
ABOUT  FINAL  VALUE 


i 

,  V? 
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the  correct  distribution)  may  be  identified  during  the  experiment.  We 

A  A 

simply  compute  the  variance  of  the  difference  p  (iW-1)  -  p  (N)  as 

m  m 

var[p  (N+l)  -  p  (N)]  «  var[6p  (N)]  =  — - — 7  var[p  (N)]  (9) 

mm  m  (W+l)‘  m 

As  an  example  of  the  use  of  (9)  and  Table  1,  consider  a  situation 

A 

where  p  (100)  =  3  and  we  are  interested  in  the  three-standard- 
m 

A  ^ 

deviation  allowable  change  at  N=10Q.  We  observe  that  varfp  (N) j  = 

m 

A  2  2 

p  (N)[W(P  ,N)/2]  .  Thus,  from  (9),  selecting  P  =  3,  we  compute 
m  m 

^2(100)  (W(3,100)/2j2  o2u  000W0-.2 

var [6p  (100)]  =  — - -  3  i-L283)/2 j-  1  (.00420) 2  .  (10) 

m  1012  1012 

From  (9)  we  see  that  the  three-standard  deviation  cutoff  for  |6p  (100)1 

m  1 

A 

is  .0126.  If  che  magnitude  6p  (N)  ever  exceeds  the  3o  value  we 

m 

have  good  reason  to  suspect  an  inconsistent  data  value.  If  at  the  point 
that  such  a  deviation  is  encountered  the  offending  number  is  flagged, 
set  aside,  and  not  included  in  the  cumulative  average.  A  post  experiment 
analysis  may  be  able  to  discover  if  indeed  a  programming  error  could  be 
responsible. 

C.  An  Example 

An  excellent  example  for  illustrating  some  of  the  above  concepts  is 
the  testing  of  the  gaussian  random  number  generator  which  was  used  for 
all  of  our  subsequent  Monte-Carlo  experiments.  A  description  of  the 
generator  is  given  in  complete  detail  in  the  appendix.  The  normal 
deviates  were  obtained  in  pairs  by  the  following  polar  transformation 
of  the  uniform  deviates: 


let  u  ,  u 
1  1 


be  independent  and  distributed  uniform  on  [0,1),  then 
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x2  =  (~loge  u^2  sin(2TiuJ  , 

(11) 

and 

1/ 

x2  =  (“loge  Uj)  1  cos(2itu2) 

(12) 

are  independent  and  distributed  gaussian  with  zero  mean  and  unit 
variance . 

First  we  make  a  quick  check  of  the  unifoirm  generator  alone,  testing 

5 

its  mean  and  variance  based  on  two  independent  Monte  Carlos  of  10 
samples.  Sequence  One  used  the  seed  735776465527^  (see  Appendix)  and 
Sequence  Two  used  the  seed  311037552421..  The  Monte  Carlo  results  of 

O 

central  moments  were  as  follows 


Sequence  One 

■^(lO5;  =  .498931-0.5  P2(105)  ■  .0834604 

(13) 

Sequence  Two 

<10S)  =  .500517-0.5  p2(10S)  =  .0833716 

(14) 

Now  since  Pj  =  0.0,  p2  =  0.0833...,  p*  =  0.03125,  and  p^  *  0.0125, 
we  \y  effectively  use  the  bounds  in  (4)  and  (7)  to  bracket  the 
allowable  deviations.  For  this  experiment  we  determine 


»  (9.1287  x  10-1*) 
o. 


(15) 


and 


i  . — 


'Vy2 


j  <n  L»P125~L0^3)_  b  __2.  °  n  (2.3570xl0~4)  (16) 

"  °zi  10 5  6  V  5x10s  °2 


The  number  of  standard  deviations  of  the  two  experiments  may  be  computed 
by  substituting  the  values  from  (13)  into  (15)  and  (14)  into  (16), 
giving 

n  n 


Sequence  One 


o. 

a2 

1.171 

0.537 

0.565 

0.162 

Sequence  Two 
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Note  that  all  four  results  are  consistent  with  the  hypothesized 
distributions.  Next  we  combine  the  results  of  both  experiments  by 
averaging,  generating  a  sequence  of  twice  the  length,  resulting  in 
p^xlO5)  =  .499^24-0.5  and  P2(2xlQ5)  =  0.083416.  The  calculations 
of  (14)  and  (15) (with  N  =  2xlOS)  now  yield  nQ  =  0.428  and  nQ  =  0.496, 
respectively,  also  consistent  with  the  hypothesis  that  the  numbers  are 
taken  from  a  distribution  with  mean  0.5  and  variance  0.0833.... 

The  above  results  were  hypothesis  testing  results  -  i.e.,  given  a 
hypothesized  distribution,  what  is  the  like’ihood  that  the  given 
samples  come  from  that  distribution?  Me  will  return  to  the  hypothesis 
problem  shortly,  but  meanwhile,  what  if  we  had  no  -pothesis?  We  would 
then  have  to  use  a  different  approach  to  establish  bounds  on  the  true 
moments.  We  would  take  sample  values  to  estimate  p2>  for  example,  and 
use  the  bounds  (5)  to  determine  the  allowable  range  for  p2  based  on 
the  sample  ^(2x10  ).  Taking  a  nominal  value  of  p2  from  this  range 
we  could  then  use  (7)  to  determine  whether  the  assumption  that  Pj  =  0 
were  reasonable.  If  Pj  =  0  were  not  reasonable  according  to  (7)  we 
would  finally  return  to  (4)  and  determine  a  more  plausible  range  for 
p  .  The  result  of  the  above  procedure  will  be  confidence  intervals 
about  each  of  the  moments  of  the  sampled  distribution. 

Turning  now  to  the  transformed  deviates  (hypothesized  gaussian) 
we  describe  a  more  extensive  set  of  experiments.  Using  the  transformation 
(10)  -  (11)  on  the  above  two  uniform  sequences  we  calculated  the  first 

g 

six  moments  of  the  samples  based  on  Monte  Carlos  of  length  N  =  5x10 
for  both  starting  conditions.  The  raw  sampled  moments  are  summarized 


in  Taole  2. 
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Table  2.  Monte  Carlo  Moments  of  Gaussian  Generator 


Sequence  One 

Sequence  Two 

Average 

Theory 

a 

h 

0.000036 

0.000052 

0.000044 

0 

h 

o.p^ssi 

0.99S315 

0.999583 

1 

h 

-0.000808 

-0.000639 

-0.000724 

0 

A 

>*4 

2.996913 

2.995235 

2.996074 

3 

A 

H5 

-0.011973 

-0.013211 

-0.012592 

0 

^6 

14.967856 

14.974867 

14.9713615 

15 

First  we  determine  if  the  averaged  moments  (column  3  in  Table  2) 
are  consistent  with  the  hypothesized  moments  based  bn  a  Monte  Carlo  of 
length  10  .  The  results  are  given  in  Table  3,  where  the  odd  moments 
are  evaluated  with  (7)  and  the  even  moments  with  (4).  As  can  be  seen 
from  the  table,  all  of  the  computed  standard  deviations  are  less  than 
1.3,  leading  to  a  high  degree  of  consistency  with  the  Gaussian  hypothesis 
for  the  first  six  moments.  In  order  to  compute  the  standard  deviations 
in  Table  3, the  moments  p  =  105,  p  *  945,  and  p  =  10395  were 

o  10  A  2 

Table  3.  Testing  the  Hypothesis  of  Gaussian  Moments 


Odd  Moments  Even  Moments 


1 

3 . 1623x10"** 

0.1391 

2 

4.4721xl0-1* 

0.9324 

3 

1.2247xl0“3 

0.5911 

4 

3.0984xl0"3 

1.2671 

5 

9.7211xl0~3 

1.2953 

6 

3. 1890x10' 2 

0.8980 

■i 

4 
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required.  The  standard  deviations  in  the  table  reflect  knowledge  of  the 
exact  moments  to  almost  four  digits  for  the  first  three  moments  and  to 
almost  three  digits  for  the  fourth  through  sixth  moments. 

Of  course  any  finite  number  of  sampled  moments  may  match  the 
hypothesized  moments  and  still  the  density  may  not  be  in  fact  gaussian. 
Thus  a  more  extensive  test  is  required  to  check  the  density  hypothesis 
itself.  Statistical  tests  such  as  the  Chi-squared  test  [3]  and  the 
Kolmogorov-Smimov  test  [4]  have  been  devised  to  study  this  very  problem. 
The  Kolmogorov  test  is  the  most  valuable  for  the  problem  at  hand  since 
it  is  both  asymptotically  efficient  and  distribution  free,  whereas  the 
Chi-squared  test  is  not.  Basically  the  Kolmogorov  test  begins  by  ,.on- 
structing  an  empirical  cumulative  distribution  function  from  the  samples 
x^,  assumed  ordered  such  that  <  x2  <  *“’  <  xn*  Assume  the  samples 
are  taken  from  an  unknown  distribution  U(x),  i.e.,  Pr(xi  <  x)  =  U(x). 

Then  construct  the  empirical  distribution  as  follows: 

(1  x  1  0 

let  H(x)  =  }  (heaviside  step  function)  (17) 

(0  x  <  0 

Then 

N 

fn(x)  4  nSHU“X1)  ’  (18) 

i-1 

where  x^  are  the  ordered  samples.  Suppose  we  are  interested  in  testing 
the  hypothesis  Hq,  where 

Hq  :  (J(x)  =  F(x)  (given)  .  (19) 

Kolmogorov  turned  the  H  problem  into  a  threshold  test  of  the  following 
form.  Determine  the  value  of 

K^  =  (N)  ^  sup  j  F  (x)  -  l\.(x)  |  , 

X 


(20) 
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then  reji  t  H  if  is  too  large.  In  order  to  determine  how  large 
could  be  expected  to  be  it  was  necessary  to  calculate  the  distribution 
of  K^.  Kolmogorov  originally  proved  that 

lim  Pr(KN<X)  =  *(A)  =  1-2^  (-l)j+1  s“2j  X 
N~  j-1 


(21) 


Later,  Massey  [5]  derived  a  recursive  method  for  obtaining  Pr(K^<A)  for 
all  N.  Table.  4,  taken  from  Massey's  paper,  illustrates  how  quickly  the 
distribution  converges  to  its  asymptotic  limit.  If  N  is  large  (say 
5000)  we  may  use  (21)  as  a  quick  test  for  K^.  For  a  plot  of  (21)  and  its 
derivative,  see  Fig.  2. 


Table  4 . 

PrOC^A) 

from  Massey 

[5] 

N 

o\ 

o 

II 

1.0 

1.1 

1.2 

1.3 

1.4 

10 

.66 

.78 

.85 

.91 

.95 

.97 

20 

.65 

.77 

.85 

.91 

.94 

.97 

30 

.65 

.76 

.85 

.90 

.94 

.96 

40 

.64 

.76 

.84 

.90 

.94 

.96 

50 

.64 

.75 

.84 

SO 

.63 

.75 

.84 

70 

.63 

.75 

.83 

80 

.63 

.74 

CO 

.607 

.730 

.822 

.888 

.932 

.960 

In  order  to  test  the  gaussian  generator,  then,  M  =  1000  sample 
paths  of  length  K  =  5000  were  generated  and  the  numbers  K^(i),  i**lt»**,M 
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were  calculated  (where  for  convenience  we  assume  K^(1)<K^(2)<* • *<K^(M)  ) 
Next  an  empirical  distribution  F  ,  .(A)  of  the  form  (18)  was  deter- 

VM) 

mined  and  compared  with  (21)  .  Table  5  shows  some  of  the  points  compared 
for  each  sample  path  of  the  generator.  Table  5  shows  conclusively  that 
not  only  is  the  maximum  K^(M)  consistent  with  the  hypothesis  Hq  (with 
F(x)  gaussian)  but  the  distribution  of  (i)  is  consistent  with  Hq. 
Thus  we  have  illustrated  in  effect  a  successful  second  order  Kolmogorov 
test. 


Table  Results  of  Kolmogorov  Test 


Sequence  One 

Sequence 

Two 

i/M 

y*> 

♦tyi)] 

Vi} 

♦  IKjjU)] 

0.1 

0.5688 

0.0973 

0.5823 

0.1132 

0.2 

0.6414 

0.1949 

0.6502 

0.2084 

0.3 

0.6975 

0.2845 

0.7090 

0.3038 

0.4 

0.7540 

0.3795 

0.7769 

0.4179 

0.5 

0.8215 

0.4904 

0.8319 

0.5068 

0.6 

0.8669 

0.5599 

0.8996 

0.6067 

0.7 

0.9412 

0.6616 

0.9788 

0.7065 

0.8 

1.0461 

0.7761 

1.0571 

0.7862 

0.9 

1.1914 

0.8831 

1.2007 

0.8881 

1.0 

1.8518 

0.9979 

1.8975 

0.9985 

We  now  know  enough 

about  the  generator 

to  state  with 

high  confiden: 

that 

the  distribution  of 

the  numbers  is  in 

fact  Gaussian. 

We 

also  will 

need  to  know  if  the  successive  numbers  from  the  generator  are  independent. 
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Of  course  the  test  for  independence  is  nothing  more  than  to  determine 
if  the  multivariate  densities  are  all  products  of  univariate  Gaussians. 
Such  a  test  would  be  prohibitively  costly,  due  to  the  large  number  of 
samples  required.  Thus  we  will  have  to  be  satisfied  by  demonstrating 
that  successive  samples  are  uncorrelated.  Accordingly,  we  compute  a 
sampled  correlation  function  defined  by 


J2(k)  "  f^k  £  <*n  - 


■k  i-k. 

> 


(22) 


i=k+l 


k  =  0,1 .  The  sampled  correlation  function  just  reduces  to  the  auto¬ 

correlation  u2  of  (1)  for  k  0.  Using  the  two  sample  functions  of 

the  generator  we  have  obtained  values  of  (21)  for  k  =  1 . 4.  Refer 

to  Table  6  for  these  experimental  results. 


Table  6.  Sampled  Correction  Function 


Sequence  One 

Sequence  Two 

Average 

k 

h2(k) 

u2(k) 

P2(k) 

1 

-0.000588 

-0.C00131 

-.000360 

2 

-0.000031 

0.000160 

.000065 

3 

-0.000790 

0.000568 

-.000111 

4 

0.000588 

-0.000129 

.000230 

Now 

the  variance  of  (21) 

is  given  by 

vax 

2 

W2 

=N^k 

(23) 

we  may  determine  the  number  ot  standard  deviations  associated  with 

(say) 

the  average  of  the 

correlations  :n  Table  7, 

7 

whereby  N  =  10  . 

r-Vt'rr 


f-TRU- •:■"-* 
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The  results  are  given  in  Table  7»  where  we  see  that  uncorrelatedness  is 
a  highly  consistent  hypothesis  for  the  generator,  since  the  number  of 
standard  deviations  is  consistently  less  than  1.2. 

Table  7.  Uncorrelatedness  Test 


k 

.  ■( 

\N-k  / 

m2(W 

n  - - 

0  0 

1 

3.1623xl0“4 

1.1384 

2 

3 .1623x10“ 4 

0.2055 

3 

3<1623xl0“4 

0.3510 

4 

3.1623xl0“4 

0.7273 

D.  Conclusions 

It  has  been  seen  that  a  systematic  use  of  standard  statistical 
results  can  lead  to  a  qua  .cative  level  of  confidence  associated  with 
a  Monte  Carlo  analysis.  Since  the  latter  is  of  such  great  importance 
in  evaluating  the  performance  of  nonlinear  estimators,  it  goes  almost 
without  saying  that  no  Monte  Carlo  results  should  be  published  without 
a  high  confidence  assessment  of  their  accuracy. 

Naturally  an  important  characteristic  of  simulations  is  repro¬ 
ducibility.  Thus  the  machine-independent  random  number  generator 
analyzed  above  will  be  of  great  service  to  those  engaged  in  comparative 
analyses  of  several  candidate  filters,  since  the  simulations  need  not 
necessarily  be  run  on  the  same  computer. 

Tbe  two  starting  numoers  provided  for  the  random  number  generator 
in  this  paper  yield  two  indepen ’ont  sequences,  both  of  which  are  useful 


in  stochastic  systems  simulation.  Furthermore,  the  characteristics  of 
the  Initial  segments  of  the  two  sequences  are  sufficiently  different 
so  as  to  provide  complementary  tests  of  an  estimator  with  only  two 
relatively  short  Monte  Carlos. 
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Appendix.  A  Machine  Independent  Random  Number  Generator 

A.  Introduction 

A  continually  recurring  problem  associated  with  Monte  Carlo 
simulations  on  digital  computers  is  the  lack  of  reproducibility  of 
results.  The  simulation  performed  by  one  researcher  on  one  computer 
must  essentially  be  believed  by  others  without  access  to  the  same 
machine.  This  problem  arises  primarily  from  the  fact  that  (1)  generally 
the  random  number  generator  used  on  a  given  system  has  been  optimized 
for  speed  and  efficiency  by  system  programmers,  (2)  the  generator 
depends  in  subtle  ways  on  the  particular  idiosyncrasies  of  the  machine, 
and  (3)  rarely,  if  ever,  does  sufficient  documentation  exist  so  that 
the  user  can  unravel  the  secrets  of  the  method  employed.  This  is  not 
to  say  that  such  generators  are  not  reliable  but  only  that  they  are  not 
reproducible.  Thus  it  is  frequently  impossible  for  various  researchers 
to  compare  their  results  for  the  same  problem  on  different  machines. 

It  is  the  purpose  of  this  appendix  to  propose  a  solution  to  the  problem  - 
to  sacrifice  speed  and  efficiency  for  reproducibility,  independent  of 
the  word-length  of  the  binary  machine  employed.  The  generation  method 
is  simply  a  congruence  method  [1]  with  the  factors  appropriately  seg¬ 
mented  so  that  numerical  overflows  can  be  computed  without  hardware 
overflows,  since  the  latter  can  lead  to  machine-dependent  results.  It 
turns  out  that  a  convenient  overall  length  for  the  equivalent  random 
numbers  is  36-bits,  since  such  numbers  can  be  evenly  partitioned  into 
1,  2,  3,  6,  or  9  pieces  and  yet  the  cycle-length  of  the  generator  is 
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adequate  for  almost  any  conceivable  experiment.  It  is  demonstrated  in 
this  article  that  simple  FORTRAN-II-compatible  coding  exists  for  the 
multipart  generators  so  that  reproducibility  can  be  effectively  achieved 
with  no  special-purpose  coding,  regardless  of  the  machine.  It  is  impor¬ 
tant  to  keep  in  mind  here  that  speed  is  not  the  design  criterion  which  is 
being  considered  in  this  article.  However,  many  shortcuts  could  be  made 
by  using  machine  coding  for  the  generators.  An  example  of  the  36-bit 
generator  is  given  with  test  results  for  two  starting  numbers.  The 
reader  is  invited  to  choose  the  generator  having  the  appropriate  number 
of  segments  for  his  computer  and  reproduce  exactly  the  results  contained 
here  -  to  demonstrate  the  purpose  of  the  paper.  It  is  clear  that  tK 
same  segmentation  principle  will  work  on  any  deterministic  generator 
which  is  defined  by  standard  mathematical  transformations. 

I .  The  Congruence  Method 

It  is  desired  to  produce  a  long  sequence  of  random  variates  {x^} 
which  can  pass  any  standard  test  for  randomness  and  are  distributed 
essentially  uniformly  on  the  interval  [0,1).  A  common  method  involves 
the  calculation,  for  some  length  m,  a  sequence  of  integers  {ln} 
between  0  and  2m-l  by  modular  arithmetic  and  obtaining  the  desired 
real  variables  xn  by  dividing  the  ln  by  2m.  For  example,  let 
ln+^  be  obtained  from  1^  by  the  operation 

1  ,  =  (a  1  +  b)  mod  2m  (A-l) 

n+1  n 

While  many  combinations  for  a,  b,  and  m  have  been  proven  successful 
[1],  one  of  the  more  common  combinations  involves  letting  b  «*  0  and 

g 

taking  a  to  be  of  the  form  r2  +1,  where  s  ^  2.  The  multiplier  r 
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need  only  be  chosen  so  that  the  number  of  significant  bits  in  r2S±l 
is  approximately  m.  In  this  way  each  application  of  (A-l)  will  force 
1R+^  into  the  next  cycle,  leaving  little  or  no  correlation  between 
adjacent  numbers.  For  this  article  we  shall  choose  a  =  8v  -  1,  or 
any  m-bit  number  with  the  last  three  bits  equal  one.  Thus  we  are 
left  with  the  choice  of  m.* 

The  common  choice  for  the  word-length  of  the  1  is  the  machine 
word  length.  Such  a  choice  automatically  makes  the  generator  machine 
dependent,  since  the  modular  arithmetic  contained  in  (A-l)  will  usually 
result  in  arithmetic  overflows,  the  result  of  which  is  not  predictable 
in  general  for  all  machines.  Consequently  we  will  consider  segmenting 
the  numbers  into  the  form  (m  is  evenly  dividable  by  q) 

m(q-l)  m(q-2) 

1  -  l1  2q  +  l2  2q  +  •••  +  lj  2°  (A-2) 

n  n  n  n 

If  we  then  segment  a  into  the  analogous  pieces 

m(q-l) 

a  =  a1  2q  +  • ♦ •  +  aq  2°  ,  (A-3) 

we  may  perform  the  operation  (A-l)  as  follows:  First  we  compute 

2ra(q-l) 

al  =  a1 -l1  2  q 
n  n 

2m(q-2) 

+  (a1  l2  +  a2  1*)2  q  +  . .. 

n  n 

2m 

+  <>“"1  +  Cu  2  ’  -  ““  2°  <*-4) 


*  See  [1]  for  an  equivalent  discussion  for  decimal  machines.  The 
proposed  technique  applies  only  to  binary  machines. 
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Next  wo  observe  that  ail  but  the  last  half  of  the  center  term  and  the 
rest  of  the  lower  order  terms  are  eliminated  when  we  take  the  modulus 


relative  to  2m,  leaving 

,  -)  "I3-] 

1  =  (a1  lq  +  a2  lq  x  -f  *  •  •  +  aq  l1)  mod  2q  ;  2  q 

n+1  I  n  n  n  1 


q-1 


+  (a2  1 1  4-  a3  l' 

'  n  n 


+  • * ■  +  aq  lq  2°  , 

n 


+  aq  i2)  2  q  2 


(A-5) 


where  [•]  is  the  usual  bracket  (integer  part)  function.  Finally,  we 

see  that  the  last  half  of  the  last  term  in  (A-5)  is  Ci-  the 

remainder  of  the  last  term  plus  the  last  half  of  the  next  to  last  term 

is  lq~L  ere.  We  summarize  the  algorithm  involved  to  identify  the 
n+1 

lc  k 

pieces  of  1  ,,  in  Fig.  A-l,  where  it  is  assumed  that  a  and  1 
n-tl  n 

are  defined  from  the  previous  operation  or  initially  by  a  suitable  seed. 

We  see  from  studying  (A-5)  that  the  maximum  precision  required  to 
accomplish  the  update  is  only  2q-bits  plus  the  necessary  carry-over 
bits  to  add  up  q  numbers  of  length  —  and  one  number  of  length  JE, 

q  m  m  q 

This  exact  number  of  bits  is  p  =  [log^{q(2q  -  1)2+  2^  -  1}]  +  1, 
where  the  bracket  function  has  again  been  used  again  to  denote  the 
largest  integer. 

B.  An  Example 


In  this  section  we  introduce  a  specific  example  of  the  generator 
for  m  =  36,  which  is  chosen  since  it  is  sufficiently  long,  and  yet 
evenly  dividable  by  q  ■=  2,  3,  4,  6,  9,  etc.  Thereby  providing 


*S5»> 


n 

$ 

i 
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significant  flexibility.  The  choice  of  q*  is  determined  primarily  so 

that  the  operation  (A-5)  does  not  lead  to  integer  overflows  on  the 

machine  concerned.  Table  A-l  summarized  the  hardware  requirements 

for  several  combinations. 


Table  A-l.  Partition  Requirements 
for  m  =  36  bit  random  numbers 


q  =  No.  of  Pieces 


—  =  Bits  per  Piece 

q 


Bits/(positive)  integer 


1 

36 

72 

2 

18 

37 

3 

12 

26 

4 

9 

20 

6 

6 

15 

9 

4 

11 

12 

3 

10 

18 

2 

8 

36 

1 

6 

The  number  of  carry  bits  increases  at  such  a  rate  that  it  is  not 
very  efficient  to  divide  the  generator  into  more  than  about  six  or  nine 
parts,  but  the  principle  remains  the  same  regardless. 

In  order  to  initialize  the  generator  it  is  neceasary  to  provide  q 
pieces  of  a  suitable  seed  (i.e.,  one  that  leads  to  a  reliable  sequence). 
Table  A-2  lists  the  multiplier  a  =  7 35 77646552 7 g  ai  d  the  initial 


*  Of  course  the  individual  pieces  need  not  necessarily  be  the  same 
length  but  this  choice  makes  the  algorithm  simplest  to  code. 


numbers  iQ  =  a  (Sequence  One  and  1q  =  311037552421g  (Sequence  Two) 
for  several  different  dichotomlzations. 


Table  A-2.  Partition  Examples 


No,  of  Pieces  a(lfl  Sequence  One)  1  for  Sequence  Two 


2 

735776 _  -  244734  „ 

311037  = 

102943 

8  10 

8 

10 

4655??.  =  158551 

552421  = 

185617 

8  10 

8 

10 

3 


4 


6 


735  7 

=  3823 

3110 

=  1608 

8 

10 

8 

10 

7646 

=  4006 

3755  -  2029 

8 

10 

8 

10 

5527 

=  2903, „ 

2421  =  1297 

8 

10 

8 

10 

735 

*  477, „ 

311  = 

201 

8 

10 

8 

10 

77& 

=  510, „ 

037  - 

031 

8 

10 

8 

10 

465 

U 

LO 

O 

VD 

552  = 

362, 

8 

10 

8 

10 

527 

O 

“  343, „ 

421  = 

273 

73D 

= 

59,  „ 

31 

S 

25 

8 

10 

8 

10 

57. 

- 

47 

10 

IS 

08 

8 

10 

8 

10 

76„ 

zz 

62 

37 

-3 

31 

8 

10 

8 

10 

46  o 

SI 

38 

55 

-- 

45, 

8 

10 

8 

10 

55. 

s 

45, „ 

24 

= 

20 

g 

10 

8 

10 

27 

= 

23, „ 

21 

— 

17 

3 

10 

8 

10 

Table  A-3  contains  the  first  250  octal  numbers  resulting  from 
Sequence  One  (equivalent  to  the  seed  1^  =  1  with  one  number  discarded) 
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and  Table  A-4  contains  the  initial  250  octal  numbers  from  Sequence  Two. 
The  statistics  for  these  two  sequences  is  given  in  the  main  part  of  this 
article.  The  cycle  length  of  the  generator  may  be  calculated  theoreti¬ 
cally  as  follows:  The  next  to  last  octal  digit  repeats  every  8  steps, 
the  third  from  last  every  64  steps,  etc.,  so  that  the  first  digit  repeats 
every  811==  233  ss  8.59xl09.  In  practice,  however,  the  maximum  cycle  can 
only  be  obtained  for  certain  seeds,  thus  a  check  must  be  made  t'. 
guarantee  that  the  cycle  is  sufficiently  long.  Such  a  test  has  been 
run  on  the  two  given  starting  numbers,  confirming  a  cycle  length  of 
greater  than  10  for  both.  The  repeat  characteristics  for  the  initial 
segment  of  each  sequence  is  given  in  Table  A-5. 
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Table  A-3.  Initial  Sample  Path—  Sequence  One 


0/  735776465527 
546011324661 
10/  605007 1C5407 
124027314201 
20/  176553134667 
767051140121 
30/  003422023547 
717370070441 
40/  004604602027 
051575575361 
50/  715232077707 
120347726701 
60/  372342545167 
337314554621 
70/  122324612047 
001636347141 
80/  336670306327 
605537556061 
90/  221123262207 
316227251401 
100/  226525145467 
317604301321 
110/  142603170347 
135622135641 
120/  524644402627 
467414246561 
130/  700055634507 
132422504101 
140/  373255535767 
106715136021 
150/  376370336647 
776340234341 
160/  364024067127 
651040447261 
170/  404344377007 
750206646601 
180/  201227116267 
403544102521 
190/  166317275147 
470725643041 
200/  606622143427 
630171357761 
210/  333362331307 
370540721301 
220/  312174466567 
255106157221 
230/  2721430234 17 
304377561541 
240/  254151607727 
677064000461 


310473353621 
742044723047 
133551226141 
037341537327 
434502115061 
101236233207 
655044670401 
646162236467 
613250400321 
671013001347 
455020314641 
152627333627 
227000105561 
116564305507 
374157423101 
775630326767 
015736535021 
510277647647 
706711713341 
633530520127 
633535606261 
741056550007 
032153065601 
375327407267 
0422C  300 1 52 i 
3657363061^7 
401342622041 
641260274427 
333470016761 
252510202307 
353204440301 
613632457567 
502332356221 
406301534447 
726750040541 
177751440727 
303653737461 
210374024607 
711550723001 
260414520067 
470332042721 
665204552747 
213646567241 
750417175227 
445426370161 
23P505037 1 07 
t'5oC051 15501 
054030550367 
3f,  6426637421 
515002361247 


167041756507 
640355342101 
3502321 17767 
474341134021 
430116160647 
560164372341 
716424151127 
445053745261 
515637721007 
575260304601 
045756700267 
431002700521 
355564317147 
775360601041 
707405425427 
612307455761 
570205053307 
005511157301 
266117450567 
050337555221 
465147245447 
467421317541 
327540271727 
520464676461 
110414375607 
666544742001 
445347211067 
607664541721 
277101763747 
545643346241 
525057526227 
673720627161 
223461110107 
553760434501 
371240741367 
614376636421 
252737272247 
401663704741 
43247  6352527 
334470267661 
711256012407 
207031037201 
510047461667 
353573043121 
175732370547 
737217553441 
440027567027 
467110440361 
414376104707 
427513351701 


473777560041 

062221556427 

743450114761 

3  '-50724307 

26667.667  630 * 

042233441567 

276125754221 

157523  756447 

2651 7307654 2 

454316122727 

144316635461 

666303746607 

27110176100. 

720430702067 

307500240721 

367006174747 

777441125241 

1 5r,007057227 

775734066161 

3306U4161107 

205774753501 

35110013236? 

113327635421 

721203203247 

527374763741 

272007403527 

763355026661 

631244563407 

442014656203 

207274352667 

721731342121 

631546001547 

023534132441 

052132320027 

674336477361 

517640355707 

777736470701 

057370563167 

653502156621 

652411370047 

714713611141 

721112624327 

254315660061 

136662340207 

450437213401 

146437763467 

231117103321 

064610546347 

606030577641 

515343520527 


712641373067 
526574737721 
305721405747 
350037704241 
757225410227 
652470325161 
563076232107 
353052272501 
124366323367 
042241634421 
150756114247 
046407042741 
550107434527 
241460565661 
434702334407 
326541475201 
146450243667 
34055064 1121 
1701  704 1  i:  547 
264051511441 
543324051027 
115505536361 
015051626707 
230422607701 
712242154167 
265316755621 
642713501047 
554424470141 
627606055327 
265216217061 
463257311207 
534352632401 
611617054467 
364021202321 
657002357347 
125624756641 
610550451627 
443151407561 
702116163507 
314326565101 
207731744767 
710254537021 
62.’  370025647 
707667555341 
336506436127 

716164310261 
156701226007 
47040342760 1 
030455625267 
336136203521 
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Table  A-4.  Initial  Sample  Path  -  Sequence  Two 


0/  311037552421  222760501707  553141172641  645504742267  421051453461 

473364616247  332563504701  566217475627  113326416521  436576550607 

10/  542451520741  731701007167  410550523561  125147221147  002327337001 

547717776527  273546072621  006643307507  135125057041  311217344067 

20/  675431003661  234703714047  341176601101  446056167427  732017356721 

053565536407  770664425141  113267170767  143633473761  024401476747 

30/  502132453201  610621250327  320523453021  336714455307  717531003241 

106377305667  335621374061  512727351647  160233735301  522640221227 

40/  473763357121  632477064207  377623371341  747756712567  703733504161 

023674314547  676663627401  551702062127  062360073221  672044163107 

50/  663364767441  333714607467  154633024261  077747347447  421123131501 

416435613027  514612417321  061222752007  251116375541  205237774367 
60/  365060554361  124517472347  542553043601  024232233727  655503553421 
505142230707  273261013641  466057451267  107475514461  104253705247 
70/  517054365701  431336544627  411734517521  103451177607  150555441741 
547002416167  045242664561  046163210147  632010320001  677121745587 
80/  260526273621  473076636507  534025100041  357537653067  455201244661 
356334603047  667257662101  341433036427  222761657721  263271765407 
90/  764170546141  07651c377767  344472034761  445737265747  123023634201 
317441017327  114060054021  544561604307  314251224241  621225414667 
100/  676556035061  642462040647  731145216301  603012670227  051322060121 
760375113207  103367712341  053673721567444776245161  077113703547 
110/  376025210401  330477431127  062330674221  230663112107  004765410441 
621410516467  577013665261  223143636447  505724612501  o07746062027 
120/  460605320321  370652601007  220163116541  274202603367  07236751536! 

253160661347  615224624601  633345402727  670130554421  674272757707 
130/  300601634641  524761160267  723142555461  700451774247  220606246701 
050344613627  456423620521  130172626607  526162362741  353333025167 
140/733056025561  361406177147  565032301001  007112714527  323667474621 
302101165507  174326121041  056207162067  402172505661  771074472047 
150/  117001743101  712476705427  312205160721  450445214407  523175667141 
042774606767  436651375761  747304054747  253456015201  47064756632.7 
160/  435775455 72 !  652775733307  566572445241  536002523667  465133476061 
731123527647  027717477301  602454337227  315341561121  021542142207 
170/  207035233341  472437730567  577562006161  660142272547  251187571401 
573464000127  321062475221  605651041107  660367031441  422633425467 
.180/  567015526261  313047125447  604567273501  722345331027  443461221321 
111351430007  463330637541  567574412367  731617456361  275231050347 
190/550037405601  510447551727  711536555421  557372506707  754323455641 
551211667267  264630616461  007157063247  460601127701  358241662627 
200/  732173721521  223563255607  334070303741  616112434167  553012166561 
3 1 6040 1 66147  144415262001  106672663527  206211675621  622652514507 
210/  257230142041  554005471067  275404746661  623743361047  671165024101 
650231554427  040711461721  102267443407  506704010141  237302015767 
220/323351736761  75"G57643747  634651176201  113445335327  447274056021 
050561062307  47'.  7 1466624 1  023506632667  502132137061  505474216647 
230/  275332760301  £30405006227  511042262121  065156171207  573403554341 
473032737567  4f5C66547161  575777661547  221173152401  131637347127 


240/  157375276221  1 '  i  i'C 5770107  246771452441  50660533^467  325640367261 
676461414447  726*72754501  365033600027  706216122321  732117257007 
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Table  A-5.  Repeat  Characteristics 
of  the  Generator  for  each  Sequence 


rcle 

Sequence  One 

Sequence  Two 

0 

735776465527 

8 

311037552421^ 

81 

062221556427 
— 8 

113326416521^ 

82 

650107434527 
- 8 

6555035534218 

83 

617512155527^ 

514633562421q 

84 

65527 

8 

524218 

85 

465527. 

8 

552421q 

86 

6465527 

8 

7552421. 
- 8 

The  final  concern  in  describing  the  generator  is  to  give  examples 
of  coding  the  generator.  These  examples,  written  in  FORTRAN  II- 
compatible  code,  assume  no  special  hardware  characteristics  except  that 
only  the  applicable  number  of  pieces  has  been  selected.  It  is  further 
assumed  that  the  subroutine  remains  core-resident  so  that  all  locally 
defined  variables  remain  unchanged  between  calls.  If  the  subroutine 
must  be  dynamically  reloaded  on  call  (either  because  of  load  on  call 
restrictions  or  virtual  core)  then  the  locally  defined  variables  must 
be  stored  globally  in  COMMON.  We  leave  this  modification  to  the  user. 
Table  A-6  contains  coding  examples  for  two,  three,  four,  and  six-piece 
generators.  No  optimization  has  been  done. 
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Table  A-6.  FORTRAN-11  Coding  Examples 


Two-Piece  Generator 


FUNCTION  rtANFCNS, MODE ) 
DIMENSION  NS(1),NCC2> 
IF  (MODE)  10,100,10 
10  Ml *244734 
M2=158S51 
N 1 =NS ( 1  ) 

N2=NS<2) 

T 1*2. **(-18) 

T2=2.**( -36) 

MP=2**18 

100  DO  200  1=1,2 

00  TO  C 1 10, 120), I 
110  K=M2*N2 
00  TO  190 

120  K=M1*NH+M2*N1+KD 
190  KL  =K/MR 
200  NCCI )=K-Kb*MR 
N1=NC(2) 

N2=NC< 1 ) 

AN 1 =N 1 
XK2=N2 

hANF=XNl *1 1+XN2*T2 

HE TURN 

END 


Three-Piece  Generator 


FUNCTION  RANF ( NS > MODE ) 
DIMENSION  NSC  1 )*NC(3) 

IF  (MODE)  10, 100,10 
10  Ml =3823 
M2 =4006 
M3 =2903 
N1 =NS  < 1  ) 

N2=NS  <2 ) 

N3=NS(3 ) 

T 1  =2 •**( -1 2 ) 

T2=2 ♦  **( -24 ) 

T3=2.**< -36) 

MP=2**1 2 
100  DO  200  1=1,3 

GO  TO  C110, 120, 130), I 
110  K=N3*M3 
GO  TO  190 

120  K=N3*M2+N2*M3+KD 
GO  TO  190 

130  K=N3*Ml +N2+M2+N1 *M3+KD 
190  KD=K/MP 
200  NC < I ) =K-KD*MP 
N 1 =NC  <  3  ) 

N2=NC(2 ) 

N3  =NC ( 1  ) 

XN 1 =N 1 
XN2=N2 
XN3  =N3 

RANF=XN1 *T 1 +XN2*T2+XN3*T3 
RETURN 
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Four-Piece  Generator 


FUNCTION  HANF(NS*MODE) 

DIMENSION  NS ( 1 ) *  NC ( 4 ) 

IF  (MODE)  10*100*10 
10  Ml =477 
M2 =5 1 0 
M3 =3 09 
M4  =343 
N 1 =NS ( 1  ) 

N2=NS ( 2 ) 

N3=NS(3 ) 

N4 =NS ( 4  ) 

T 1 =2 •**( -9  ) 

T2=2 • **( -1 8  ) 

T3=3  »**( -27  ) 

T4=4.**(-36) 

MP=2**(9) 

100  DO  200  1=1*4 

GO  TO  (110*120*130*140), I 
110  K=N4*M4 
GO  TO  190 

120  K=N4*N'/+N3*M4+KD 
GO  TO  190 

130  K=N4*M2+N3*M3+N2*M4+KD 
GO  TO  190 

140  K  =N4  *M 1 +N3  *  M2  +N2  *N3  +N 1 *M4  +KD 
190  KD=K/MP 
200  NC ( I ) =K~KD*MP 
N1 =NC (4 ) 

N2=NC (3 ) 

N3=NC(2 ) 

N4=NC( 1 ) 

XN1=N1 
XN2=N2 
XN3 =N3 
XN4  =N4 

hANF=XNl  *T1  +XN2*T2+XN3*T3+XN4*T4 

liETUhN 

END 


Six-Piece  Generator 


FUNCTION  RANF ( NS , MODE ) 

DIMENSION  NS  ( 1  )  ,  NC  (  6  ) 

IF  (MODE)  10,100,10 
10  Ml =59 
M2  =47 
M3  =62 
M4=38 
M5  =45 
M6=23 
N 1 =NS (  1  ) 

N2=NS (2 ) 

N3 =NS  <  3  ) 

N4 =NS(4) 

N5  =NS ( 5  ) 

N6=NS(6) 

Tl=2.**(-6> 

T2=2.**(-12) 

T3=2.**(-18) 

T4=2.**(-24) 

T5=2.**(-30) 

T6=2.**(-36) 

MP=2**( 6 ) 

100  DO  200  1=1,6 

GO  TO  (110, 120, 130, 140, 150, 160), I 

110  K=N6*M6 
GO  TO  190 

120  K=N6*M5+N5*M6+XD 
GO  TO  190 

130  K=N6*M4+N5*M5+N4*M6+KD 
GO  TO  190 

140  K=N6*M3+N5*M4+N4*M5+N3*M6+KD 
GO  TO  190 

150  K=N6*M2+N5*M3+N4*M4+N3*M5+N2*M6+KD 
GO  TO  190 

160  K=N6*M1 +N5*M2+N4*M3 +N3*M4 +N2*M5+N 1 *M6+KD 

190  KD=K/MP 

200  NC ( I ) =K-KD*MP 
N1 =NC (6) 

N2=NC ( 5 ) 

N3  =NC  (4 ) 

N4  =NC ( 3  ) 

N5=NC ( 2 ) 

N6=NC (  1  ) 

XN 1 =N 1 
XN2=N2 
XN3 =N3 
XN4  =N4 
XN5=N5 
XN6=N6 

RANF  =XN  1  *T  1 +XN2  *T2+XN3  *T3  +XN4  *T4  +XN5  *T5  +XN6*T  6 

RETURN 

END 
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The  coding  of  Table  A-6  can  be  optimized  as  much  as  desired  so 
long  as  the  same  calculations  are  performed.  This  optimization  will  be 
a  function  of  the  timing  considerations  for  each  machine  so  that  there 


is  no  point  is  describing  it  further  here. 

The  application  of  the  generator  involves  simply  supplying  the 
q-part  generator  with  a  q-element  array  NS(*)  containing  the  numbers 
taken  from  Table  A  for  the  first  call  with  MODE^O.  Subsequent  calls 
are  taken  with  M0D£=0. 

In  order  to  transform  the  uniform  numbers  into  gaussian,  the 
coding  of  Table  A-7  can  be  used.  This  coding  was  used  for  the  tests 
described  in  this  article.  The  initialization  is  done  the  same  as  for 


ft 


RANF  through  the  array  NST(*)  when  MODE#).  The  subroutine  generates 
two  numbers  every  other  time  and  recalls  the  unused  one  in  between. 
The  returned  variable  x  is  gaussian  with  mean  0  and  standard 
deviation  1.  Again,  if  the  storage  is  not  protected  when  the  sub¬ 
routine  is  not  active,  it  is  necessary  to  save  the  status  words 
XR( • ) ,  J,  and  TWOPI  in  COMMON. 
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Table  A- 7 .  FOKTHAN-II  Coding  of  Gaussian  Generator 


FUNCTION  GAUSS(NST>MODE > 
DIMENSION  NST (  1  )  j  XRC2) 

IF  (MODE)  10,20*10 
10  J=2 

IV;  OP  I  =8  •  *AT  AN  ( 1  ) 

Xh  Cl) =RANF ( NST  >  1  ) 

GO  TO  35 

20  GO  TO  (30>40 ) ,  J 
30  J=2 

XK(1 )=KANF(NST,0) 

35  XRC2)=RANF(NST,0) 

XI  =SGHT(ABS  (  -2  #*ALOG  (XRC  1  )))) 
X2  =T W OP I *XR ( 2 ) 

Xh(l )=X1*SIN(X2) 

XR  (  2  )  =X  1  *C  OS  (  X2  ) 

GAUSS =XR( 1  ) 

RETURN 
AO  J=1 

GAUSS =XR( 2 ) 

RETURN 
END 


******»-• 
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V.  Parallel  Computational  Techniques 

A.  Introduction 

The  success  of  any  numerical  technique  is  directly  linked  to  the 
application  of  a  suitable  computational  device  for  the  problem  at  hand. 

If  the  required  computational  algorithm  is  completely  unstructured  then 
very  little  can  be  done  in  general  to  improve  computation  over  a 
straightforward  serial  machine  which  performs  a  single  calculation  at  a 
time.  Most  problems  contain  some  independent  computational  paths,  or 
parallelisms,  however,  and  nearly  a  decade  has  been  devoted  by  the 
computer  industry  to  the  problem  of  identifying  and  exploiting  natural 
parallelisms  of  algorithms.  The  fact  that  so  many  approaches  to  the 
problem  have  developed,  on  the  other  hand,  is  testimony  to  the  com¬ 
plexity  of  the  problem  in  general.  In  this  chapter  we  will  illustrate 
the  fact  that  Bayes  Law  calculations  are  highly  structured  with  essen¬ 
tially  arbitrary  parallelism,  and  discuss  some  of  the  proposed  and 
existing  forms  of  parallel  architecture  in  relation  to  this  problem. 

B.  Parallelism  and  Bayes  Law 

Before  introducing  the  alleged  parallelism  of  Bayes  law  it  is 
appropriate  to  distinguish  among  the  various  forms  of  parallelism  which 
exist  in  computers.  We  observe  that  the  following  "levels  of  computation' 
exist  within  a  computational  system:  Electronic,  Logic  (Bit),  Functional 
Unit,  Instruction  Stream,  Process  or  Task,  Job  Step,  and  Job. 

Parallexlsm  at  the  electronic  and  bit  levels  is  essentially  universal  in 

Preceding  page  blank 
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computer  architecture  today.  Some  computers  have  parallelism  at  the 
functional  unit  level  (i.e.,  look-ahead  machines).  However,  very  few 
commercially  available  machines  exhibit  parallelism  (other  than  one  or 
two  copies)  at  the  instruction  stream  or  higher  levels.  The  reason  is 
clear:  hardware  duplication  is  expensive  and  unless  there  is  a  good 
probability  that  the  extra  hardware  will  always  be  active  (i.e.,  not 
idle)  then  the  payoffs  rapidly  diminish.  To  gain  a  quick  appreciation 
for  the  complexity  of  the  problem  consider  the  evaluation  of  the  simple 
six  term  sum  a+b+c+d+e-t-f :  If  we  only  have  one  adder  then  we  might 
consider  the  evaluation  in  the  form  (((((a+b)+c)+d)+e)+f) ,  (which, 
of  course,  is  not  unique)  since  this  evaluation  requires  only  one 
temporary  storage  location.  The  operation,  which  takes  five  add  cycles 
in  succession,  is  diagrammed  in  Fig.  1. 

Suppose,  now,  you  were  given  two  adders  which  operate  independently. 
Then  you  might  consider  the  evaluation  (((a+b)+(c+d))+(e+f)) ,  which  is 
diagrammed  in  Fig.  2.  Now  you  have  obtained  the  answer  in  three  add 
cycles  at  the  expense  of  two  temporary  storage  locations  and  one  idle 
adder  in  the  last  cycle  (since  you  only  need  to  do  five  adds).  It  may 
be  shown  that  the  computation  in  Fig.  2  is  maximally  parallel  in  the 
sense  that  no  fewer  chan  three  add  cycles  are  required,  no  matter  how 
many  adders  you  have  at  your  disposal.  Consider  the  case  with  three 
adders,  for  example,  where  after  adding  (a+b),  (c+d)  and  (e+f) 

simultaneously,  you  are  left  in  an  analogous  position  to  level  two  in 
Fig.  2.  Thus,  after  doubling  your  functional  units  you  have  achieved 
the  maximum  of  40%  decrease  in  computation  time,  so  that  additional 
units  only  decrease  the  profit/cost  ratio  for  this  problem.  An 
interesting  discussion  of  the  parallel  evaluation  of  arithmetic 
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expressions  is  given  by  Baer  and  Bovet  [2].  For  a  definition  of 
maximal  parallelism  and  a  development  of  the  associated  theory,  see 
Keller  [9]. 

Now  that  a  note  of  caution  has  been  inserted  we  proceed  to  a  dis¬ 
cussion  of  highly  structured  parallel  computations,  where  we  use  Bayes 
Law  as  an  example.  Bayes  Law  may  be  expressed  operationally  in  the 
form 

Cn+1  Vl®  -fif  1  (1) 

where  T^^.x)  is  a  spatially  varying  kernel  which  is  a  function  of  the 
product  of  probability  densities  of  the  noises  and  the  new  measurement 
Z^,  and  is  the  normalizing  constant  (i.e.,  the  integral  of 

the  right-hand  side  of  (1)  over  all  y)  .  It  should  be  clear  from  inspect¬ 
ing  (1)  that  there  is  implicitly  a  form  of  parallelism  called  for,  in 
that  for  e^ery  y.  in  Jn+j,  a  d-dimensional  convolution  Integral  must 
be  computed.  The  explicit  form  of  parallelism  will  be  a  function  of  the 
nature  of  the  (finite-dimensional)  algorithm  chosen  to  implement  (1). 

The  point-mass  approach  of  Bucy  and  Senne  [5]  maps  (1)  into  an  equivalent 
matrix  multiplication.  The  least-squares  series  expansions  of  Hecht  [8], 
Alspach  [1],  and  Center  [6]»  for  example,  replace  (1)  by  an  equivalent 
update  for  expansion  coefficients.  The  exact  implications  on  the 
structure  of  the  computer  will  of  course  be  dependent  upon  the  form  of 
the  algorithm.  To  be  specific,  then,  since  many  of  the  associated 
problt  <".'e  similar,  we  vi.1.1  discuss  the  matrix  multiply  analogy  to 

(1) .  where  symbolically  we  write 

M 


i  =  1,...,M  , 


(2) 


■-  '  “~r, -C7 


"  3f r  W  +*'  '& -V"  «■  ■ 
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M 

Cn+1  -  £  -WV  > 


(3) 


WV  =  Jn+AVCn+l  ’  1  =  1 . M 

M 

Xn+l|n  ^i  ^n+l^i'* 


(A) 

(5) 


The  overall  structure  of  (2)  -  (5)  (where  we  choose  to  discuss  the 

conditional  mean  estimate  as  a  specific  example)  illustrate  both  purely 

parallel  (2),  (4)  and  essentially  serial  (3),  (5)  computations.  While 

(by  analogy  with  the  example  in  Figs.  1-2)  it  is  obvious  that  some 

parallelism  could  be  built  into  (3)  and  (5),  it  is  also  clear  that  a 

processor  which  is  optimized  to  perform  the  totally  parallel  operations 

(2)  and  (4)  would  be  mostly  wasted  on  the  scaling  and  estimation 

integrals  of  (3)  and  (5).  Thus  it  is  clear  that  even  in  this  highly 

structured  problem  a  combination  of  computer  architectures  would  be 

necessary  to  optimize  computational  speed  with  minimum  overhead  (i.e., 

idle  functional  units  or  processors).  Ideally,  then,  we  might  envision  a 

parallel  computer  simultaneously  evaluate.-.  Tn(y^,£.)  Jn(£.)  for 

i  »  1 . H  and  accumulates  the  sum  (2)  in  successively  for 

j  =  1,...,M.  (If  we  had  enough  processors  and  temporary  storage  we 

could  even  imagine  tree-structuring  tne  latter  computation  in  the  form 

of  Fig.  2.)  Imagine,  further,  that  we  only  had  to  use  in  (2)  so 

that  our  J'  .(y.)  is  off  by  u.  factor  of  C  ,  which,  simultaneous  to 

the  evaluation  of  (2),  we  are  evaluating  in  an  auxiliary  serial  machine, 

and  x(njn-l),  which  we  are  also  evaluating  in  a  serial  machine  (modulo 

C  )  using  J'  instead  of  J  .  The  resulting  computation  timing  could 
n  6  n  n 

take  the  form  in  Fig.  3. 


Combining  Serial  and  Parallel  Computations 
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It  is  reasonable  to  assume  that  if  the  serial  and  parallel  processors 
in  Fig.  3  are  on  the  order  of  speed,  then  block  1  would  be  the  slowest, 

4  the  next  slower,  2  the  next,  and  3  the  fastest.  Thus,  assuming 
synchronization  initially,  the  serial  computer  would  be  idle  waiting  for 
block  1  to  finish,  then  block  3  would  finish  before  (4),  but  since  1 
doesn't  require  the  result  of  4,  the  parallel  computations  may  proceed 
directly  without  a  wait  from  3  to  1,  Finally,  assuming  the  sum  of  the 
execution  times  of  4  and  2  is  less  than  the  sum  of  the  execution  times 
of  1  and  3,  the  serial  processor  will  again  be  v>aiting  at  the  synchroniza¬ 
tion  point  in  the  next  cycle  when  the  parallel  processor  finishes. 

Although  the  computations  in  Fig.  3  are  somewhat  simplified  (no 
allowance  has  been  made  for  floating  grids,  for  example)  the  figure 
clearly  illustrates  the  desired  tradeoff  between  essentially  serial 
(overhead)  calculations  and  mostly  parallel  computations.  While  it 
should  be  understood  that  a  parallel  serial  computer  could  be  designed 
around  the  problem  in  Fig.  3,  it  is  not  reasonable  to  expect  that  such  a 
special  purpose  machine  is  likely  to  evolve  in  the  near  future.  Thus  it 
is  more  relevant  to  consider  the  Bayes  estimation  problem  in  light  of 
current  existing  parallel  architectures. 

C.  Look-Ahead  Processors 

The  only  existing  computational  device  which  even  closely 
approximates  a  "general  purpose"  parallel  processor  (with  respect  to  a 
single  instruction  stream)  Is  the  look-ahead  machire,  as  exemplified  by 
the  IBM  360/91  or  the  COC  6600  (or  7600)  .  These  machines  have  a  multi¬ 
plicity  of  distinct  (although  not  necessarily  dissimilar)  functional 
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units.  The  CDC  6600,  for  example,  has  two  multipliers,  two  incrementers, 
an  adder,  an  integer  adder,  a  divider,  a  boolean  unit,  a  shift  unit,  and 
a  branch  unit  for  a  total  of  ten  distinct  units  [7].  In  addition,  there 
are  collections  of  operand  registers  (6),  resultant  registers  (2), 
address  registers  (8),  increment  registers  (8),  and  instruction  registers 
(8)  to  provide  for  high  speed  temporary  data  flow  and  simultaneous 
instruction  operation  together  with  the  above-mentioned  hardware,  the 
look-ahead  processor  has  a  set  of  conflict  rules  whereby  the  reservations 
and  use  of  the  functional  units  is  directed  with  the  goal  of  maximum 
local  parallel  operation. 

Short  of  assembly  language,  program  there  is  no  guarantee  of  optimal 
use  of  the  look-ahead  processor  hardware.  Giver,  the  look-ahead  rules, 
there  are  many  situations  which  will  cause  the  unit  to  hang-up  without 
all  units  active.  For  example,  one  unit  may  require  an  operand  to  be 
loaded  from  the  same  memory  page  that  a  previously  implemented  store 
operation  has  tied  up,  or  a  functional  unit  might  be  awaiting  the  second 
of  two  operands  which  is  not  available  for  a  variety  of  reasons.  There 
are,  however,  some  programming  techniques  which  will  increase  the 
prospects  for  successful  parallelism  with  the  look-ahead  processor.* 

We  will  discuss  some  of  these  principles  in  light  of  the  Bayes-Law 
computation. 

Basically  one  should  attempt  to  reduce  the  interdependence  of 
adjacent  calculations  to  a  minimum.  We  would  start  out  with  the  same 
basic  outline  as  in  Fig.  3  and  expand  the  outlines  so  that  locally  at 


*  Although  most  of  the  simulations  described  in  this  repovt  were  performed 
on  a  CDC  6600,  due  to  a  number  of  factors  no  attempt  was  made  to  optimize 
the  code. 
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most  one  or  two  arithmetic  operations  are  dependent.  This  might  mean, 

for  example,  a  partial  realization  of  the  operator  T  including 

n 

breaking  down  the  exponential  function  if  there  is  one,  performing 
the  partial  realizations  for  all  j  in  the  new  array  before  returning 
for  the  next  stage  in  the  decomposition  of  T^.  The  arrays  of  storage 
must  be  layed  out  with  the  page  partitions  of  core  memory  in  mind,  since 
sequential  access  to  the  same  page  causes  avoidable  delays.  The  CDC 
6600,  for  example,  has  the  page  address  contained  in  the  least  signifi¬ 
cant  address  bits,  so  that  adjacent  memory  addresses  are  in  different 
memory  pages.  The  obvious  implication,  then,  is  that  simultaneous 
operand  arrays  should  be  interleaved  to  maximize  the  probability  that 
all  local  computations  refer  to  consecutive  core  locations. 

The  fact  that  look-ahead  processing  is  reasonably  successful  for 
most  scientific  problems  results  from  the  fact  that  most  algorithms 
exhibit  some  degree  of  local  parallelism,  regardless  of  optimization 
efforts.  When  an  extremely  parallel  structure  is  encountered,  however, 
it  is  possible  to  take  maximum  advantage  of  look-ahead  architecture 
by  maximizing  local  parallelism,  since  the  look-ahead  stack  seldom 
contains  more  than  a  few  dozen  instructions . 

D.  Array  Processors 

As  the  name  implies,  an  array  processor  consists  of  an  array  of 
identical  processing  delements.  The  complexity  of  the  individual 
processors  may  vary  bi-t  they  all  operate  on  the  same  instruction  stream 
with  different  data.  As  an  example,  consider  the  llliac  IV,  built  in 
conjunction  with  University  of  Illinois  by  the  Burroughs  Corporation  [3]. 
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The  Illiac  contains  an  8  x  8  array  of  sophisticated  processors,  driven 
in  lock  step  by  a  Burroughs  6500  computer.  Tse  and  Larson  [10],  [13] 
have  considered  estimation  and  Bayes  calculations  on  the  Illiac 
structure.  Although  the  Illiac  has  considerable  computational  potential, 
its  special  structure  precludes  arbitrary  higher  level  programming. 

Instead  the  entire  problem  computations,  array  storage,  and  data  manipula¬ 
tions  must  be  customed  tailored  for  this  special  machine.  The  necessary 
synchronization  of  64  processors  can  lead  to  considerable  processor 
idleness  if  the  computations  are  not  laid  out  carefully. 

E.  Associative  Processors 

Another  class  of  parallel  machines  consists  of  those  containing  an 
associative  memory,  as  typified  perhaps  by  the  Goodyear  Associative 
Processor  [11].  Each  processor  word  in  an  associative  memory  is 
extremely  long  (i.e.,  several  hundred  bits).  Basically  the  associative 
memory  word  may  contain  several  numbers  along  with  some  identification 
bits.  Using  maximally  parallel  binary  compare  logic,  certain  cells  are 
identified  on  the  basis  of  their  identification  bits  and  then  a  pre¬ 
arranged  arithmetic  operation  or  series  of  operations  is  performed  on 
the  selected  words.  The  concept  is  certainly  an  interesting  one,  but 
we  are  unaware  of  any  stand-alone  uses  other  than  sorting  and  cataloguing 
for  the  existing  machines. 

F.  Pipe-Line  Processor 

The  pipe-line  concept  takes  advantage  of  the  fact  that  processor 
and  logic  speeds  are  generally  much  faster  than  memory  speeds,  so  that 


certain  compute-bound  parallelizable  algorithms  are  constrained  to 
operate  at  memory  speeds  in  conventional  hardware.  If  large  numbers  of 
identical  calculations  are  to  be  performed  on  arrays  of  data,  then  one 
possible  solution  is  to  page  through  memory,  taking  one  number  from 
each  page  and  to  insert  these  numbers  into  a  long  chain  of  processors 
arranged  in  analogy  with  a  pipeline,  with  registits  between  each  pro¬ 
cessing  element.  Ideally  a  sequence  of  computations  is  performed  on 
each  group  of  data  inserted  into  the  line  and,  if  proper  allowance  is 
made  for  queing-theoretical  concerns,  the  answers  are  swept  off  the  end 
register  and  reinserted  into  memory  at  the  same  average  speed  that  the 
original  numbers  were  removed.  Again,  as  with  all  highly  structured 
parallel  processors,  a  considerable  amount  of  program  optimization  and 
data  structuring  is  necessary  to  allow  for  efficient  continuous  use  of 
the  hardware.  The  nonavailability  of  variables  to  fill  the  pipe  line, 
as  possible,  for  example,  if  an  isolated  (overhead)  calculation  is 
required,  or  if  data  arrangement  operations  can't  keep  up  with  the 
pipe-line  computations,  can  be  very  costly. 

In  principle,  at  least,  the  CDC  STAR  (the  first  existing  pipe-line 
machine)  promises  to  be  the  most  nearly  ideal  structure  for  the  Bayes  Law 
computations  in  existence.  There  remains  only  speculation  regarding  the 
ideal  computer  for  Bayes  Law,  but  it  wouldn't  be  surprising  if  it 
tunned  out  to  be  a  combination  of  the  machine  concepts  discussed  above 
in  this  Chapter.  It  is  expected  that  actually  programmed  tests  of 
Bayes  Law  will  provide  the  best  evaluation  of  the  parallelism  conjectures 
we  have  addressed  here.  It  is  clear,  however,  based  on  the  CDC  6600 
experience,  that  some  form  of  parallelism  will  be  beneficial  to  the  Bayes 


Law  computations. 
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G.  Hybrid  Computer  Methods 

While  most  of  the  present  chapter  has  been  devoted  to  parallel 
digital  computer  architectures,  it  is  appropriate  to  investigate  the 
intrinsic  parallel  computational  structure  of  the  analog  computer. 

Since  it  is  possible  to  hook  up  essentially  an  unlimited  number  of 
operational  amplifiers  operating  lock-step  in  parallel,  it  appears 
natural  to  study  the  possibility  of  implementing  the  essentially 
parallel  aspects  of  the  Bayes-Law  computations  on  an  analog  device. 

In  his  dissertation  Miller  [12]  has,  in  fact,  already  undertaken  such 
a  study.  In  the  recent  paper  of  Bucy,  Merritt,  and  Miller  [4]  (repro¬ 
duced  in  Additional  Appendix  F  to  this  report),  some  of  the  important 
details  of  this  research  are  reported. 

Basically,  Miller  has  determined  that  an  efficient  use  of  a  hybrid 
(analog  and  digital)  system  is  (1)  to  use  the  digital  computer  to  compute 
estimates,  normalize  densities,  and  accumulate  the  Bayes  integral  compu¬ 
tations,  and  (2)  to  use  the  analog  computer  to  compute  in  parallel  the 
probability  density  terms  (exponentials  in  the  gaussian-noise  case), 
and  perform  a  parallel  continuous  integration  of  the  unnormalized  Bayes 
integral  -  stopping  at  discrete  time  instances  to  send  the  results  to 
the  digital  computer.  While  on  the  surface  the  speed  advantages  over 
conventional  digital  serial  computations  look  exceedingly  attractive, 
care  must  be  used  in  evaluating  the  effects  of  various  analog  circuit 
inaccuracies  to  determine  the  attainable  accuracy.  Miller  has  used  an 
extensive  digital  simulator  of  a  hybrid  system  with  programmable 
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inaccuracies  to  discover  the  accumulated  effects  of  realistic  analog 
anomalies  such  as  amplifier  noise,  offset,  drift,  nonlinearities, 
timing  mismatch,  and  limited  dynamic  range. 

Because  of  the  above  segmentation  of  the  analog  and  digital 
components  of  the  problem,  and  because  Miller  is  considering  point-mass 
representations  of  the  a  posteriori  densities  in  the  digital  machine, 
it  is  natural  to  consider  Miller’s  hybrid  technique  an  analog  generaliza¬ 
tion  of  the  point-mass  approach  to  the  Bayes-Law  computations.  With 
that  in  mind  Miller  has  undertaken  comparisons  between  the  two  methods 

q 

on  a  one-dimensional  x  sensor  problem.  In  summarizing  his  conclusions, 
Miller  estimates  ihat  in  computing  one  estimate  for  a  four-dimensional 
problem,  one  hybrid  computer  with  250  integrators  and  multipliers  could 
accomplish  the  task  in  10  seconds,  whereas  49  CDC  6600's  operating  in 
parallel  would  require  3  seconds.  In  summarizing  the  accuracy,  he 
promises  hybrid  accuracy  of  somewhere  between  0.1  and  1.0  percent 
disagreement  between  the  hybrid  and  the  optimal  estimates.  These 
conclusions  certainly  suggest  that  further  development  of  hybrid  systems 
may  indeed  be  profitable. 


v  -w.  J.  w  -w1*? 
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VI.  A  Passive  Receiver:  Bearings  -  Only 
Tracking  (AWACS) 

A.  Introduction 

The  first  nonsc3lar  example  of  the  general  Bayesian  nonlinear 
estimation  problem  which  has  been  studied  in  considerable  detail  was 
introduced  by  Bucy,  Geesey,  and  Senne  [3].  The  problem  involves  the 
detection  and  tracking  of  targets  based  on  passive  (i.e.,  without  the 
use  of  radar  or  sonar)  receivers.  The  basic  problem  description  has 
arisen  from  a  variety  of  applications,  most  of  which  involve  some  form 
of  strategic  sleuthing,  either  by  an  airborne,  a  landbased,  or  a  sea¬ 
going  observer.  In  these  applications  it  is  presupposed  that  the  only 
form  of  measurement  available  to  the  sensor  is  bearing  information, 
resulting  from  directed  energy  radiated  from  the  target  in  the  form  of 
radar  or  sonar.  Thus,  in  order  to  extract  range  information  from  the 
noisy  bearing  measurements  it  is  necessary  for  either  multiple  sensors 
to  be  employed  at  a  variety  of  known  geographic  positions,  or  a  single 
sensor  to  be  moved  through  an  appropriate  sequence  of  positions  (i.e.,  a 
known  orbit).  The  latter  sensor  is  most  important  for  a  variety  of 
reasons:  (1)  there  is  no  need  for  multiple  sensor  location  calibration, 
a  problem  which  is  frequently  as  difficult  as  the  passive  receiver 
problem,  (2)  it  is  frequently  impractical  to  devote  a  multiplicity  of 
sensors  to  track  only  one  target,  and  (3)  it  is  usually  desirable  to 
have  the  sensor  and  its  support  vehicle  be  one  self-contained  and 

Preceding  page  blank 


132 


independent  unit.  Thus  a  sequential  estimation  problem  is  identified 
with  the  passive-receiver. 

To  be  specific  we  will  assume  that  the  target  is  a  radar-bearing, 
slowly-moving  object,  such  as  a  surveillance  ship,  and  we  will  let  the 
sensor-bearer  be  an  orbiting  aircraft,  such  as  the  Airborne  Warning  And 
Command  System  (AWACS) ,  which  is  currently  under  Air  Force  contractor 
development.  We  don't  intend  to  describe  the  complete  AWACS  problem  in 
detail  here,  but  only  to  appeal  to  one  of  its  primary  objectives  - 
passive  tracking,  as  an  example  of  the  kind  of  problem  we  are  studying. 
We  expect  that  the  very  simplified  example  we  have  chosen  to  discuss 
here  will  be  sufficiently  interesting  so  as  to  motivate  a  method  of 
attack  for  many  other  aspects  of  the  AWACS  problem. 

The  most  important  characteristic  of  the  bearings-only  receiver 
problem  is  shown  in  Fig.  1.  Here  we  see  the  basic  geometry  of  the 
receiver  (taken  to  be  restricted  to  two-dimensions  for  simplicity). 

The  target  T  is  located  at  coordinates  (Xj ,  x2)  and  the  sensor  S 
is  located  at  coordinates  (Sj,  S2) .  We  assume  that  sensor  i>  capable 
of  measuring  the  bearing  angle  h(x),  given  by 


h(x)  =  tan-1 


(1) 


Of  course  the  measurement  of  h(x)  will  in  most  cases  be 
contaminated  by  noise,  which  we  will  assume  to  be  additive,  resulting 
in  the  measurement  scheme 


dz(t)  =  h(x)dt  +  dv(t)  (2) 

where  h(x)  is  given  by  (1).  Some  possible  sources  of  the  noise  might 
be  receiver-noise  (i.e.,  shot  noise),  Electronic  Counter  Measures  (ECM) , 


or  atmospheric  noise.  In  any  case  we  will  assume  that  the  noise  is 
characterized  by  a  known  distribution  (specifically  taken  to  be  Gaussian 
for  purposes  of  example) . 

As  was  mentioned  above,  the  sensor  using  the  bearings-only  scheme 

2  2  y9 

(1)  would  not  be  able  to  measure  the  rarge  R  =  [(Xj  -  S^)  +  (x2  -S2)  ]  /z, 

regardless  of  the  target  motion  or  the  number  of  measurements.  That  is, 
the  range  is  unobservable  to  a  stationary  sensor.  Thus,  it  is  necessary 
to  require  that  the  sensor  proceed  in  a  prescribed  orbit,  which  we  will 
take  to  be  a  unit  circle,  for  example.  We  further  assume  that  the 
measurements  are  to  be  taken  only  at  discrete  instants  in  time  (at, 
say,  t  =  nA,  n  =  0,1,2,...).  The  resulting  measurement  scheme  takes 
the  form  of 


z(n)  =  tan" 


x2(n)  -  sin  B  (n+1) 
Xj(n)  -  cos  B  (n+1) 


+  v(n) 


where  we  have  assumed  that  the  sensor  proceeds  counter-clockwise  around 
the  circle  at  the  rate  B  radians  per  sample. 

In  order  to  make  the  bearings-only  problem  into  a  non-trivial 
filtering  problem,  it  is  necessary  to  introduce  a  stochastic  model  for 
the  allowable  motions  of  the  target.  There  is,  of  course,  a  wide  variety 
of  useful  possibilities  which  might  be  selected.  Consider,  for  example, 
the  cases  of  essentially  stationary  (but  with  random  velocity  vector), 
essentially  constant  velocity  (but  with  random  accelerations),  and  the 
various  orders  of  acceleration  models.  An  example  of  an  essentially 
stationary  target  would  be  an  unanchored,  but  undriven  ship,  being 
buffeted  by  the  waves.  An  essentially  constant  velocity  would  result 
possibly  from  an  "airliner  target,"  or  one  that  is  attempting  to 
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navigate  a  fixed  course,  but  is  subject  to  random  aerodynamic  forces. 
Finally,  the  various  orders  of  acceleration  models  would  usually  result 
from  maneuvering  targets.  The  dimension  of  the  underlying  state  space 
increases  by  two  for  each  increase  in  degree  of  freedom  of  the  motion 
model,  so  that  a  stationary  target  is  described  in  two  dimensions,  a 
constant  velocity  model  has  four,  and  a  constant  acceleration  model  has 
six.  If  the  acceleration  is  assumed  to  be  dynamic,  then  the  number  of 
control  states  may  increase  indefinitely.  In  addition,  one  may 
envision  measurement  bias  states,  as  well  as  many  other  unknown 
parameters,  so  that  an  initially  modest  problem  turns  into  a  monster. 

We  could  easily  have  chosen  the  most  complicated  form  of  the  problem 
for  this  example.  On  the  other  hand  we  feel  that  is  counLer-productive 
for  two  reasons:  (1)  our  principle  goal  is  to  illustrate,  with  very 
simple  examples,  what  the  Bayes-law  formulation  says  about  nonlinear 
estimators,  and  hopefully  to  provide  a  basic  understanding  of  the 
characteristics  of  optimal  estimates,  and  (2)  we  are  obli&ed  to  perform 
extensive  Monti  Carlo  experiments  in  order  to  learn  what  characteristics 
of  the  conditional  densities  are  the  most  important  for  estimation 
purposes,  and  to  provide  high-confidence  performance  analyses. 
Consequently,  we  have  chosen  to  study  the  simplest  example  of  the  passive 
receiver  problem  which  still  has  dynamics  -  the  essentially  stationary 
target  (with  random  velocity) .  The  equations  of  the  assumed  model  take 
the  form 

x(n+l)  =  $x(n)  +  u(n)  ,  (4) 

where  $  is  a  2x2  transition  matrix,  and  { u(n) }  is  a  sequence 
of  mutually  independent  random  vectors,  having  covariance  Q.  If  the  $ 
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matrix  is  an  identity  matrix,  then  performs  a  random  walk  ii 
two  dimensions.  We  will  also  be  interested  in  studying  examples  of 
stable  as  well  as  unstable  transitions. 


B.  The  Linearized  Estimator 


One  obvious  candidate  for  an  approximate  estimate  of  the  position 
of  the  target  is  the  linearized  or  extended  Kalman-Bucy  filter.  If  we 
choose  to  linearize  about  the  current  filter  estimate  x^(n|n),  th^r. 
the  linearized  measurement  function  H'n)  takes  the  form 


H(n)  =  Vxh[x&(njn)] 


a2  +  b2  a2  +  b2 


A  ^  1  i  A  A  2  i 

where  we  have  taken  a  =  x^(njn)  -  cos  6(n+l),  b  =  x^(n|n)  -  sin  8(n,',l). 
Formally  we  write  the  equations  for  the  linearized  filter  as 


x„(n+l|n)  =  $x^(n|n)  , 


L<o|-i)  =  0  , 


,  (n+l|  n+1)  =  x^ (n+1 |  n)  +  A(n+1)  j£(n+l)-hn+1[x^(n+l|n)  ]  j  , 


A(n+1)  =  P(n+l|n)H(n+l) 'R_1 (H(n+l)P(n+l|n)H(n+l) ’i-R]-1  ,  (8) 

P (n+1 j  n+1)  =  [I-A(n+l)H(n+l) JP(n+l|n) [I-A(n+l)H(n+l) ] '+A(n+1) R  A’ (n+1), (9) 


P(n+l|n)  =  <M7 (n | n) <t> 1  +  Q  ,  (1 

and  P(0 1 -1)  =  r(0) .  In  attempting  to  apply  (6)  -  (10),  however,  a 
difficulty  arises  in  the  timing  requirements,  since  the  filter  update 
(7)  requires  A(n+1)  which  in  turn  requires  H(n+1)  in  (8)  and  (9), 
but  H(n+1)  depends  on  x, (n+l|n+l)>  which  is  not  yet  available! 
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A  standard  solution  to  the  dilemma  is  to  evaluate  the  gradient  (5)  at 
the  predicted  estimate  x^(n|n-l).  If  the  predicted  and  filtered 
estimates  differ  substantially,  however*  or  if  the  measurement  function 
is  sensitive,  such  as  the  case  in  (.5) ,  then  it  is  possible  to  partially 
restore  the  proper  timing  by  iterating  (5),  (9),  (8),  and  (7),  starting 
with  (5)  evaluated  at  2^(r  jn-1)  as  above  and  continuing  until  the 
filtered  estimate  (7)  stabilizes.  After  each  iteration  the  current 
"quasi-filtered"  estimate  from  (7)  is  used  to  evaluate  the  next  gradient 
(5) .  After  the  filtered  estimate  converges  the  prediction  sequence  (10) 
and  (6)  can  then  be  performed,  thereby  completing  the  estimation  cycle. 
Denham  and  Pines  [7;  have  reported  substantial  improvement  by  using  the 
iterated  filter  for  some  tracking  problems. 

It  turns  out  that  if  the  linearized  filter  or  iterated  filter  fails 
to  perform  properly,  there  are  essentially  an  unlimited  number  of  "fixes" 
or  engineering  modifications  to  be  found  in  the  literature.  Most 
fixes  involve  some  form  of  dynamic  stabilization  of  the  equations  (6)  - 
(10)  to  avoid  such  unpleasant  characteristics  as  divergence  of  the 
errors,  negative  definite  P's,  zero  gains  (A),  or  unstable  estimates. 

We  find,  in  fact,  that  a  simple  limiting  operation  removes  the  tendency 
for  this  particular  linearized  filter  to  go  unstable  (see  Appendix  A) 
for  certain  initial  conditions. 

C.  Application  of  Nonlinear  Filtering 

The  two  approaches  which  we  have  applied  to  the  Bayes-Law  calcula¬ 
tions  for  the  passive  tracking  problem  include  point  masses  with  a 
floating  grid  and  Gaussian  sums.  De  Figuierido  [6]  has  recently 
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applied  spline  functions  to  the  problem,  but  we  have  not  had  opportunity 
to  confirm  his  results. 

In  applying  the  point  mass  technique  we  endeavored  to  utilize  the 
full  generality  of  the  method  as  described  by  Bucy  and  Senne  [4]  (see 
Chapter  III) .  That  is,  we  centered  the  grid  predictively  (including 
rotation  to  align  the  grid  coordinates  with  the  eigenvectors  of  the 
predicted  covariance  matrix).  We  also  took  advantage  of  the  unimodal 
cha'icter  of  most  of  the  densities  and  used  the  ellipsoidal  tracking 
scheme  for  calculating  the  Bayes  integral.  The  equations  implemented, 
in  fact,  were  so  similar  to  tna  ideal  that  there  is  little  point  in 
repeating  them  here. 

In  his  dissertation,  Alspach  (1]  observed  that  the  gaussian  sum 
approximation  to  the  filtering  problem  could  be  simplified  whenever  the 
plant  was  linear,  the  noise  is  gaussian  and  the  measurement  nonlinearity 
has  certain  symmetry  properties.  The  simplification  involves  identifica¬ 
tion  of  region  B  in  the  state  space  where  z(»)  =  h_( x) .  Then,  if  B 
is  a  simple  surface  in  the  state  space,  the  gaussian  sum  may  be  con¬ 
centrated  in  B.  It  turns  cut  that  in  the  passive  receiver  problem, 
the  region  B  is  a  simple  hyperplane  passing  through  the  receiver 
position.  This  may  be  demonstrated  as  follows:  Let 


„  i(x 

jUl’ 


x2): 


z  =  h(x 


x2)  j 


then  set 


or,  equivalently, 


resulting  in 


tan  z 


x2  =  (tan  z)x1  +  S2  -  Sj  tan  z  . 


(ID 


Furthermore,  simple  geometric  observations  allow  one  to  eliminate  all 
points  outside  or  inside  the  sensor  orbit,  depending  on  the  apriori 
information  concerning  the  target's  position.  Thus  the  sensor  density 
function  is  symmetric  about  the  line  (11)  and  cone  shaped  (see  Alspach 
[1]  for  figures) .  To  implement  the  gaussian  sum  approximation,  then, 
Alspach  simply  places  gaussians  on  the  line  (11) ,  resulting  in  a  one- 
diinensional  realization  of  the  problem. 
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Appendix  A.  First  Monte  Carlo  Experiments 
Point  Masses  versus  Linearized 

In  this  appendix  we  summarize  the  initial  experiments  which  we 
performed  on  the  Burroughs  B5500  machine  and  reported  in  Bucy  and 
Senne  [4].  We  make  some  revised  accuracy  estimates  concerning  the 
reliability  of  the  experiments  and  some  additional  interpretations  of 
the  results. 

The  parameters  for  the  initial  tests  were  selected  essentially 
arbitrarily  to  be 


0.5  0  " 

Q  - 

0.1  0.05 

0  1 

* 

0.05  0.1 

1  , 

R  =  0 

.1 

and  J  ~  N(0,I) . 

o  — 

In  addition  we  stored  4  standard  deviations  on  the  floating  grids  (based 
on  the  predicted  covariance)  and  used  a  3  standard-deviation  ellipse 
for  integrating  Bayes  law. 

The  raw  Monte  Carlo  data  may  be  seen  in  Table  A-l,  taken  from  Bucy 
and  Senne  [4].  The  estimates  for  one-step  prediction  jc(n [ n— 1)  were 
computed  for  n**l,,..,10,  with  x(0|“l)  =  0.  as  an  initial  condition. 

In  addition,  the  ideal  pure  prediction  covariance  Z^  =  ^n-l^'  +  ^ 
is  tabulated  for  reference.  The  pure  prediction  covariance  is  the 

A  A 

theoretical  error  performance  of  the  estimate  x  (n+1)  =  $x  (n) , 

z  z 

x  tO)  «  0,  or,  equivalently,  the  "zero-state"  or  "no-information" 

— z  — 

optimal  estimate.  The  zero-state  predictor  covariance  may  be  interpreted 
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as  the  upper  bound  on  the  performance  cf  the  optimal  nonlinear  predictor. 
The  error  performance  of  the  previous  filtered  estimate  corresponding 
to  the  entries  in  Table  2  may  be  found  by  inverting  the  transformation 
P(n+l|n)  =  <fP(n|n)<l>'  +  Q,  since  the  dynamics  are  linear.  The  results 
are  shown  in  Table  A--2,  also  taken  from  Bucy  and  Senne  [4]. 


Table  A-l.  Monte  Carlo  Performance 
of  the  Optimal  and  Linearized  Predictors 

Optimal  nonlinear  predictor  Linearized  predictor  Zero  state  predictor 


n 

Average 

error 

Average 

covariance 

Average 

error 

Average 

covariance 

Theoretical 

covariance 

1 

-0.094 

0.078 

0.359 

0.230 

0.230 

0.857 

-0.371 

0.477 

0.645 

-0.765 

-0.765 

2.728 

0.350 

0.050 

0.050 

1.100 

2 

-0.008 

0.042 

0.146 

0.068 

0.068 

0.599 

-0.277 

0.081 

0.401 

-0.183 

-0.183 

1.255 

0.188 

0.075 

0.075 

1.200 

3 

-0.004 

0.026 

0.131 

0.051 

0.051 

0.354 

-0.158 

-0.041 

0.222 

0.012 

0.012 

0.623 

0.147 

0.088 

0.088 

1.300 

4 

0.007 

0.003 

0.127 

0.088 

0.088 

0.351 

-0.081 

-0.084 

0.158 

0.074 

0.074 

0.449 

0.137 

0.094 

0.094 

1.400 

5 

0.015 

-0.022 

0.129 

0.069 

0.069 

0.342 

-0.069 

-0.106 

0.155 

0.113 

0.113 

0.723 

0.134 

0.097 

0.097 

1.500 

6 

0.038 

0.025 

0.177 

0.099 

0.099 

0.369 

-0.128 

-0.315 

0.255 

0.589 

0.589 

3.813 

0.134 

0.098 

0.098 

1.600 

7 

0.075 

0.065 

0.217 

0.140 

0.140 

0.416 

-0.188 

-0.362 

0.291 

0.824 

0.824 

6.439 

0.133 

0.099 

0.099 

1.700 

8 

0.063 

0.093 

0.159 

0.112 

0.112 

0.455 

-0.186 

-0.316 

0.223 

0.549 

0.549 

5.890 

0.133 

0.100 

0.100 

1.800 

9 

0.007 

0.043 

0.130 

0.076 

0.076 

0.364 

-0.141 

-0.321 

0.164 

0.262 

0.262 

4.950 

0.133 

0.100 

0.100 

1.900 

10 

0.026 

0.036 

0.136 

0.081 

0.081 

0.326 

-0.146 

-0.428 

0.164 

0.148 

0.148 

3.978 

0.133 

0.100 

0.100 

2.000 
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Table  A-2.  Monte  Carlo  Perfonnance  of  Optimal 
and  Linearized  Filters 


Optimal  nonlinear  filter  Linearized  filter 


n 

Average  covariance 

Average 

covariance 

1 

1.038 

0.361 

2.180 

-1.630 

0.361 

0.757 

-1.630 

2.628 

2 

0.184 

0.036 

1.206 

-0.465 

0.036 

0.499 

-0.465 

1.155 

3 

0.123 

0.003 

0.487 

-0.077 

0.003 

0.254 

-0.077 

0.523 

4 

0.108 

0.075 

0.232 

0.049 

0.075 

0.251 

0.049 

0.349 

5 

0.117 

0.039 

0.221 

0.126 

0.039 

0.242 

0.126 

0.623 

6 

0.309 

0.097 

0.620 

1.078 

0.097 

0.269 

1.078 

3.712 

7 

0.469 

0.180 

0.764 

1.547 

0.180 

0.316 

1.547 

6.339 

8 

0.234 

0.124 

0.491 

0.999 

0.124 

0.355 

0.999 

5.790 

9 

0.119 

0.053 

0.257 

0.425 

0.053 

0.264 

0.425 

4.850 

10 

0.142 

0.063 

0.256 

0.196 

0.063 

0.226 

0.196 

3.878 

In  reviewing  these  experimental  results  it  must  be  pointed  out  that 
the  confidence  analysis  of  Bucy  and  Senne  [4]  is  in  error.  The  one-sigma 
gaussian  confidence  level  is  (2a4/N) ^2,  and  not  (3o2/N)  ^2,  as  given 
in  the  original  paper  (see  Chapter  IV) .  Using  the  results  of  Chapter  IV 
we  will  now  assess  the  accuracy  of  the  above  test  results.  First  we 
convert  the  diagonal  terms  of  the  sampled  covariances  into  their  three 


144 


standard  deviation  confidence  limits,  taken  from  Chapter  IV.  The  results 
are  given  in  Table  A-3  for  the  predictor  covariances  of  Table  A-l.  The 
nonlinear  predictor  results  were  based  on  N=5QQ  Monte  Carlos  and  the 
linearized  predictor  was  run  for  N=2000. 

Table  A-3.  Monte  Carlo 
Confidence  Intervals  for  Predictors 


Optimal  Nonlinear  Predictor  Linearized  Predictor 


n 

First 

Coordinate 

Second 

Coordinate 

First 

Coordinate 

Second 

Coordinate 

1 

0. 302+0 . 443 

0.720+1.058 

0.589+0.713 

2.492+3.014 

2 

0.123+0.180 

0.503+0.739 

0.366+0.443 

1.146+1.387 

3 

0.110+0.162 

0.298+0.437 

0.203+0.245 

0.569+0.688 

4 

0.107+0.157 

0.295+0.433 

0.144+0.175 

0.410+0.496 

5 

0.108+0.159 

0.287+0.422 

0.142+0.171 

0.660+0.799 

6 

0.149+0.218 

0.310+0.455 

0.233+0.282 

3.483+4.213 

7 

0.182+0.268 

0.350+0.513 

0.266+0.322 

5.881+7.114 

8 

0.137+0.196 

0.382+0.562 

0.204+0.246 

5.380+6.507 

9 

0.109+0.160 

0.306+0.449 

0.150+0.181 

4.521+5.469 

10 

0.114+0.168 

0.274+0.402 

0.150+0.181 

3.633+4.395 

Almost  immediately  we.  ascertain  from  comparing  Table  A-3  with 
Table  A-l  that  the  "optimal"  estimate  can  not  possibly  be  optimal  for 
iterations  6,  7,  and  8,  since  the  entire  3o  confidence  band  lies  above 
the  zero-state  result  f  >r  those  samples.  In  fact  we  observe  a 
periodicity  in  the  errors  for  both  estimators  with  period  approximately 
equal  to  the  rotational  period  of  the  sensor  (between  6  and  7  samples) . 
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We  could  also  have  made  a  study  of  the  zero-bias  reliability  of  the 
estimates.  But,  owing  to  the  dubious  value  of  the  data,  we  choose  to 


proceed  on  to  the  more  recent  experiments . 
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Appendix  B.  More  Recent  Experimental  Results: 

Point  Masses  versus  Gaussian  Sums 

The  early  experiments  reported  in  the  previous  appendix  raised  a 
lot  of  questions  which  demanded  more,  tests  to  resolve.  What  made  the 
errors  cyclic  modulo  2rr  ?  Why  was  the  linearized  filter  unstable? 

Alspach  and  Sorenson  [2]  revealed  another  "linearized"  filter  which  did 
not  have  instabilities.  Thus  it  was  imperative  that  we  closely  compare 
our  results  with  theirs  in  order  to  explain  this  anomolous  discrepancy. 

Using  an  estimate  comparison  test  of  the  two  linearized  filters  it 
was  discovered  that  Alspach  and  Sorenson  had  modified  the  estimate 
update  equation  to  the  form 

[z(n+l)-h[x(n+l|n)]+n]mod  2n— rr  j  ,  (B-l) 

instead  of  the  original  form  (7).  We  may  intuitively  rationalize  the 
success  of  the  modified  measurement  scheme  by  examining  Fig.  B-l. 

Suppose  the  working  range  of  the  sensor  is  [ -ti , tt)  with  zero  referring 
to  the  first  coordinate  axis.  The  figure  shows  two  typical  situations 
which  might  arise  if  the  target  is  allowed  to  be  inside  the  sensor 
orbit,  as  is  the  likely  case  if  Jo  ~  N(0_,I).  Case  1  represents  the 
situation  for  0(n+l)  5  0.  In  this  case  h(x1) ,  the  true  bearing,  is 
positive,  while  the  bearing  to  the  estimated  position  is  negative. 

The  result  is  a  difference  in  bearing  greater  than  tt.  On  the  other 
hand,  if  both  the  estimate  and  the  target  remain  inside  the  sensor 
orbit,  then  some  nonzero  6 (n+1)  (such  as  given  in  Case  2  of  the  figure) 
would  result  in  bearing  and  estimated  bearing  of  the  same  sign,  so  that 

Preceding  page  blank 


x^  (n+1 1  n+1)  =  x^ (n+1 | n)  +  A(n+1) 
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the  difference  is  always  less  than  r.  Thus  the  figure  accounts  for 
the  modularity  of  the  error  performance  discovered  in  the  early 
experiments. 

A 

Now  the  filter  update  equation  (7)  contains  the  term  z  -  h(x) , 
which  in  turn  may  be  expressed  as  h(x)  -  h(x)  +  v,  so  that  even  if 
h(x)  -  h(x)  has  an  effective  range  from  -ir  to  u,  the  additive  noise 
is  gaussian  and  may  take  on  all  values.  Thus,  if  the  noise  magnitude 
is  usually  small  compared  with  the  magnitude  of  h(x)  -  h(x),  it  would 

A 

be  expected  that  the  modulation  of  the  difference  z  -  h(x)  as  in  (B-l) 
would  lead  to  improved  performance.  The  "fixed"  linearized  performance 
is  in  fact  dramatically  improved,  as  illustrated  in  Table  B-l,  which 
shows  the  Monte  Carlo  results  for  the  "old  problem". 

Table  B-l.  Monte  Carlo  Averaged  Sum  Squared  Error 
Performance  for  Predictors  -  Old  Problem 


Mean-State 

I terated 
Linearized 

"Fixed" 

Linearized 

Gaussian 

Sum 

Floating 

Grid 

Sample 

Predictor 

Predictor 

Predictor 

Predictor 

Predictor 

1 

1.250 

4.334 

1.468 

0.757 

0.866 

2 

1.388 

1.678 

0.974 

0.596 

0.674 

3 

1.447 

0.651 

0.562 

0.457 

0.631 

4 

1.537 

0.535 

0.550 

0.473 

0.408 

5 

1.634 

0.828 

0.617 

0.518 

0.403 

6 

1.734 

3.187 

0.169 

0.531 

0.464 

7 

1.833 

4.927 

0.540 

0.471 

0.674 

8 

1.933 

4.912 

0.618 

0.553 

0.454 

9 

2.03' 

4.248 

0.564 

0.578 

0.377 

10 

2.133 

3.763 

0.619 

0.641 

0.443  • 

Unstable 

Unstable 

Stable 

Stable 

Stable 
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What  is  shown  the  table  is  the  trace  of  the  sampled  covariance 
matrices  for  the  Mean-State  (ideal  predictor),  the  Iterated-Linearized, 
the  Fixed-Linearized,  the  Gaussian  Sum,  and  the  Floating  Grid  Predictors. 
Table  B-l  is  taken  from  Senne  [8],  which  was  printed  incorrectly  due  to 
an  editing  error.  The  f ixed-linearized  predictor  is  seen  in  the  table 
to  be  stable,  and,  based  on  only  100  Monte  Carlos,  essentially  equivalent 
to  both  the  Gaussian-Sum  and  Floating-Grid  predictors.  It  turns  out, 
in  fact,  that  the  "old  problem,"  as  depicted  in  Fig.  B-l,  is  far  easier 
than  originally  intended.  The  sensor  is  given  considerable  new  informa¬ 
tion  with  each  sample  if  it  orbits  around  the  target  at  the  high  rate 
of  8=1  radian  per  sample.  Thus  it  is  not  surprising  to  find  that  it 
is  relatively  easy  to  design  an  approximate  estimator  which  performs 
very  close  to  optimum.  The  straightforward  linearized  predictor 
performs  miserably,  it  happens,  as  discussed  above.  We  can  conclude, 
however,  that  the  old  problem  is  not  sufficiently  difficult  to  demon¬ 
strate  the  difference  in  performance  between  the  various  estimators. 

We  return  thus  to  the  originally  intended  geometry,  illustrated  in 

3 

Fig.  B-2,  where  the  initial  density  Jq  ~  N([^],I),  R=0.01,  and  F=I, 

so  that  the  target  is  undergoing  pure  random  walk  in  both  dimensions. 

The  Monte  Carlo  perrormance  summary  for  the  "new  problem"  is  given  in 
Table  B-2,  where  we  may  discern  that  the  linearized  predictor  (in  this 
case  the  "fix"  is  unnecessary)  converges  more  slowly  to  steady  state 
than  the  optimal  estimates,  but  the  accuracy  of  the  Monte  Carlo  (100 
sample  paths)  still  precludes  discrimination  for  steady-state  operation. 
Thus  we  have  come  to  the  conclusion  that  the  passive  receiver  with 
gaussian  noises  is  very  linearizable  in  that  the  conditional  densities 
are  very  nearly  gaussian,  at  least  in  range-bearing  coordinates. 
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Table  B-2.  Monte  Carlo  Averaged  Sum-Squared  Error 
Performance  for  Predictors  -  New  Problem 


Sample 

Mean-State 

Predictor 

Linearized 

Predictor 

Gaussian 

Sum  Predictor 

Floating  Grid 
Predictor 

1 

2.200 

1.665 

11.436* 

0.955 

2 

2.400 

1.928 

1.723 

1.020 

3 

2.600 

2.187 

1.495 

1.006 

4 

2.800 

2.229 

1.321 

1.083 

5 

3.000 

2.190 

1.208 

1.118 

6 

3.200 

2.145 

1.456 

1.359 

7 

3.400 

1.830 

1.625 

1.521 

8 

3.600 

2.105 

1.437 

1.370 

9 

3.800 

2.136 

1.423 

1.132 

10 

4.000 

2.349 

1.286 

1.316 

11 

4.200 

2.370 

1.431 

1.493 

12 

4.400 

2.114 

1.53J 

1.515 

13 

4.600 

1.883 

1.475 

1.345 

14 

4.800 

1.752 

1.398 

1.268 

15 

5.000 

1.752 

1.553 

1.441 

16 

5.200 

1.850 

1.617 

1.534 

17 

5.400 

1.627 

1.275 

1.222 

18 

5.600 

1.561 

1.271 

1.238 

19 

5.800 

1.620 

1.630 

1.314 

20 

6.000 

1.688 

1.803 

1.556 

We  note  in  passing  that  the  first  sample  estimate  of  the  Gaussian 
sum  predictor  suffers  from  a  transient  at  about  Monte  Carlo  number  50, 
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so  that  the  number  in  the  table  is  inaccurate.  The  filter  recovers 
stability,  however,  so  that  this  number  may  be  ignored. 
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Appendix  C.  A  Movie  of  Conditional  Densities 

In  the  previous  two  appendices  the  relative  success  of  the  linearized- 
type  predictor  can  be  related  to  the  validity  of  the  Gaussian  approxima¬ 
tion  to  the  conditional  density  functions.  The  simplest  way  to  destroy 
the  validity  of  the  Gaussian  assumption  is  to  provide  a  nongaussian 
initial  density  function.  Consider,  for  example,  the  detector  geometry 
illustrated  in  Fig.  C-l.  If  there  were  a  sequence  of  known  reflecting 
ionospheric  layers  above  the  aircraft  observer  and  we  were  given  an 
apriori  distribution  on  the  transmitter's  power,  then  it  is  conceivable 
that  we  might  want  to  integrate  over  all  elevations  to  maximize  detect¬ 
ability,  thus  introducing  a  multimodal  range  ambiguity  as  shown  in  the 
figure.  Probabilistically,  the  initial  condition  would  be  obtained  by 
taking  the  product  of  the  multimodal  range  ambiguity  density  with  the 
bearing  ambiguity  density.  The  result  might  look  as  in  Fig.  C-2,  where 
no  particular  scale  is  intended.  As  soon  as  the  aircraft's  detector 
circuits  obtain  a  reliable  detection,  the  aircraft  banks  left  into  a 
circle  of  unit  radius  and  activates  a  high  sensitivity  receiver  which  is 
tuned  to  one  elevation. 

In  order  to  study  the  evolution  of  the  conditional  densities  a 
movie  was  made  by  choosing  the  parameters 
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Fig.  C-l.  Detection  Geometry  in  the  Presence  of 
Multipath  Reception 
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standard  deviation  corresponding  to  0.25  radian  in  bearing  from  the 
sensor  and  a  down-range  standard  deviation  of  0.25  in  each  mode 
(corresponding  to  the  unknown  transmitter  power) .  If  the  densities  are 
plotted  isometrically  as  viewed  from  the  dire.ction  shown  in  Fig.  C-2, 
then  some  of  the  more  interesting  examples  from  the  movie  [5]  are  shown 
in  Fig.  C-3  with  sample  numbers  as  given.  As  seen  in  Fig.  C-3,  the 
density  first  peaks  up  on  the  wrong  range  when  very  little  information 
is  available  from  the  measurements,  and  later  makes  a  dramatic  change 
to  the  correct  mode  and  stabilizes.  This  behavior  is  clearly  illustrated 
in  the  solid  plot  of  Fig.  C-4  of  the  resulting  prediction  error  magnitude. 
In  contrast  the  dotted  curve  in  Fig.  C-4  shows  the  performance  for  the 
same  sample  path  of  the  linearized  predictor,  which  is  almost  identical 
to  the  mean  state  estimate  for  the  problem.  Fig.  C-4  illustrates  two 
interesting  phenomena.  The  linearized  approximation  can  be  made  useless 
by  only  a  large  uncertainty  in  initial  conditions.  On  the  other  hand, 
the  nonlinear  estimator,  which  contains  the  more  detailed  information 
about  the  conditional  densities  may  still  perform  satisfactorily.  In 
addition,  the  optimum  system  creates  a  certain  amount  of  optimism  for 
itself  initially  and  calculates  a  relatively  high-confidence  incorrect 
estimate.  Experience  with  several  such  sample  paths  has  resulted  in 
the  conclusion  that  the  situation  in  Fig.  C-4  is  just  about  the  worst 
possible  case,  that  on  the  average  the  performance  is  without  much 
improvement  for  about  half  the  iterations  (0.5  radian  of  sensor  rotations), 
and  then  proceeds  to  converge  at  about  the  same  rate  as  the  previous 
examples.  The  linearized  predictor,  on  the  other  hand,  without  a 
suitable  initial  condition  doesn't  ever  converge  at  all. 


C-2.  A  Priori  Density  Resulting 
Multipath  Detection  Ambiguities 


Fig.  C-3.  A  Typical  Sample  Path 
Resulting  from  the  Multipath  Detection  Ambiguity 


The  following  21  pages  constitute  Fig.  C-3.  Note  the  initial 
error  mode  lock  on  and  then  the  gradual  recovery  to  the  correct  mode. 
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The  examples  have  shewn  the  wealth  of  experimental  experience 
available  to  the  estimation  engineer  as  a  result  of  numerical  studies 
of  the  problem.  It  is  possible,  for  example,  to  assess  the  validity 
of  given  analytical  approximations  and  the  value  of  unused  information 
regarding  the  structure  of  the  estimation  problem.  In  addition  it  is 
possible  to  compute  practical  lower  bounds  on  perf ormance  for  nonlinear 
estimators  in  some  small  dimensional  problems,  a  fact  which  may  have 
some  important  engineering  significance  to  designers  of  sensors,  for 
example,  who  must  know  the  theoretical  limitations  on  the  performance 
of  a  given  design.  It  is  anticipated  that  future  experiments  along 
these  lines  will  substantiate  this  conjecture.  In  the  meantime  there 
appears  to  be  many  possible  application  areas  to  explore.  See,  for 
example,  Chapter  VII. 
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VII.  Example:  Optimal  Nonlinear  Phase  Demodulation 

A.  Introduction 

An  interesting  application  for  optimal  nonlinear  estimation  was 
introduced  by  Mallinckrodt,  Bucy,  and  Cheng[7],  who  considered  the 
problem  of  tracking  a  first.-order  phase  process  based  on  meas.t  rements 
of  a  modulated  signal  ir.  noise  of  the  form 

dz(t)  =  A  cos  [ioQt  +  x^tjjdt  +  dv(t) 

where  A  is  a  known  amplitude,  is  a  known  carrier  frequency,  and  Xj(t) 

is  the  message  process  being  tracked.  The  measurement  noise  is  assumed 

white.  Using  a  voltage-controlled  oscillator  the  known  carrier  may  be 

removed  by  heterodyning  down  to  base  band,  producing  bom  in-line  and 

quadrature  components  and  resulting  in  an  equivalent  two-dimensional 

measurement  process  of  the  form 

dz  (t)  cos  x  (t)  dv  (t) 

1  =  1  dt  +  1  (1) 

dz2(t)  sin  Xj(t)  dv2^  » 

where  A  has  been  taken  as  unity  without  loss  of  generality,  and  the 
noise  has  been  replaced  by  a  vector  of  mutually  independent  quantities. 

The  first-order  phase  process  studied  in  [7]  consisted  of  Brownian 
motion  with  increment  of  length  h  having  variance  qh.  In  this  paper 
we  describe  a  study  of  a  second  order  phase  process  involving  the 
integral  of  Brownian  motion,  expressed  as 


We  will  retain  the  same  measurement  model  (1)  and  let  the  noises  Vj  and 
v2  be  independent  Brownian  motions  with  paths  of  length  h  having 
variance  rh. 

The  familiar  technique  for  tracking  such  phase  processes  involves 
the  application  of  the  phase-locked  loop,  studied  thoroughly  by 
Viterbi  [8].  The  phase-locked  loop  is  a  very  ingenious  nonlinear 
estimator,  capable  of  near-optimal  performance  in  good  signal/noise 
environments.  The  steady-state  behavior  of  the  loop  is  identical  to 
that  of  the  so-called  "linearized"  or  "extended"  Kalman-Bucy  filter  for 
nonlinear  systems,  however,  as  was  illustrated  in  [7].  Thus,  the 
phase-locked  loop  is  an  excellent  example  of  a  very  successful  extended 
Kalman-Bucy  filter.  In  the  following  section  we  discuss  the  stationary 
behavior  of  the  linearized  filter  and  the  consequences  of  time 
descretization  of  the  problem.  Then  in  the  subsequent  section  we  will 
recast  the  discrete-time  solution  in  terms  of  optimal  nonlinear 
estimation,  thereby  setting  the  stage  for  a  description  of  the  numerical 
experiments . 


B.  The  Linearized  Kalman-Bucy  Filter 

Equations  for  the  standard,  continuous,  linearized  Kalman-Bucy 
filter  are  reproduced  in  Table  1  for  the  above  phase-estimation 
problem.  The  measurement  function  is  linearized  about  the  current 
estimate  XjCt)  of  the  phase.  The  approximate  filter  attempts  to  track 
the  mean  of  the  conditional  phase  density,  which,  of  course,  is  the 
minimum  mean-squared  error  estimate.  As  a  result,  the  loop  estimate 
takes  on  all  real  values,  whereas  the  original  problem  is  only 
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Table  1.  Summary  of  Continuous 
Linearized  Kalman-Bucy  Filter 

Phase  Process  Model 


0  1 

F  =  G 

0  0  , 


dx  =  Fxdt  +  Gdg,  x(o)  N[o,Z(o)] 

1  m 


L1.]  • 

Observation  Model 


w2-/' 
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dz  =  h(x)dt  +  dv  ,  E(vt-v0)(vt-vo)’  =f  R  ds  , 


r  0 

0  r  , 


h(x)  = 


sin  Xj 


Filter  Model 


dx  =  (F-KH)jxdt  +  K(d_z-dz+Hx) 
=  Fjcdt  +  Kdjz  , 


dz  =  h(x)  ,  H  =>  Vxh(x)  = 


-sin  Xj  0 

cos  x,  0  , 
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observable  modulo  2it.  Accordingly,  it  makes  sense  to  consider  the 
modulus  of  the  error  in  the  interval  [-ir,ir)»  or  equivalently  to  take 
E[(e+ir)mod  2ir-ir]  as  an  error  criterion.  Naturally  if  the  signal  to 
noise  ratio  is  high  enough  we  would  expect  the  minimum  of  the  mean- 
modulo-2ir-squared  error  to  be  essentially  equivalent  to  mean-squared 
error,  but  for  higher  noise  situations,  the  modulation  of  the  error 
would  tend  to  bound  the  maximum  mean-modulo-2ir-squared  error  (since  the 
worst  case  will  be  a  uniform  error  density  on  [— tt, tt) ) .  In  Section  C 
below  we  will  discuss  a  nonlinear  estimate  designed  to  be  defined  only 

On  [  -7T,7T)  . 


In  order  to  implement  the  nonlinear  filter  on  a  digital  computer 
we  will  need  to  descretize  time  by  some  interva]  A.  rhe  selection  of 
A  will  be  made  so  as  to  assure  essentially  the  same  steady-state 
performance  of  the  continuous  and  discrete  phase-locked  loops. 

Accordingly,  we  will  now  evaluate  the  steady-state  solution  of  the  matrix 
Ricatti-equation  for  the  linearized  filter,  which  we  know  is  approximately 
equal  to  the  error  variance  for  low-noise  applications.  If  we  make  the 
definitions 
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0  q  j 

-sin  Xj  0 
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0 

cos  Xj  0 
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L  J 

(3) 


then  the  linearized  filter  satisfies  the  equations 

d£  =  (F  -  KH)xdt  +  K(d£  -  dz_  +  Hxdt)  (4) 


where 


/s 

dz  = 


cos  x1 
sin  Xj 


dt, 


x(0)  =  0 
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K  =  PH'R-1, 


where  P  satisfies 


4—  =  FP  +  PF1  -  PIi'R-1HP  +  Q, 

at 


with  P (0)  =  2.  This  system  is  diagrammed  in  Fig.  1.  We  may  set  the 
-arivative  in  (5)  to  zero  as  a  necessary  condition  for  steady-state 
and  algebraically  determine  the  possible  steady-state  solutions. 


Accordingly,  we  obtain 


Jl  r^ 

r*  q* 


r*  q* 

/2rVt> 


JR  q 


/Tft  * 

Nr 


=  47 


y 

where  t  =  v2(— )  **,  the  filter  time  constant,  is  obtained  by  studying 

q 

the  dx  equation  (4)  with  (6)  substituted  for  P  as  follows:  The 
homogeneous  part  of  (4)  is  rewritten  as 
dx  =  (F  -  KH)  x  dt 
=  (F  -  PH'R-1  H)  x 


i  J  %  y2 

—  /2  r  z  q  ^ 


x  dt. 


The  eigenvalues  of  the  matrix  in  (8)  are  the  solutions  A  to 
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cesulting  in  A  -  (“) 74  (-1  +  i) ,  or,  using  the  definition  for  x 

JT  r 


in  (7), 


X  -  7  (-1  ±  i). 


Thus  the  solutions  to  (8)  are  of  the  form 


\  =  c0e  1  cos  (^  +  Cj), 


so  that  x  is  indeed  a  time  const&nt.  From  (4)  and  (8)  we  may  write  the 
steady-state  differential  equations  for  the  extended  Kalman-Bucy  filter 
(i.e.,  the  phase-locked  loop)  as 


dx  =  F  x  dt  +  K  (d£  -  dz) , 


'or,  equivalently. 


A  /  A  A 

dXj  =  x2  dt  +  —  (-sin  Xj  dZj  +  cos  Xj  dz2), 


and  dx2  =  i  x^  dZj  +  cos  x^  dz2).  (12) 

Refer  to  Table  1  for  a  summary  of  above  results.  The  corresponding 
results  for  the  discrete-time  filter,  obtained  by  Hecht  [6],  are 
summarized  in  Table  2.  If  we  are  interested  in  simulating  a  filter 
which  behaves  substantially  the  same  as  the  continuous  filter  then  we 
must  choose  the  sampling  interval  carefully  so  that  it  is  as  large  as 
possible  subject  to  the  constraint  that  the  discrete  covariance  in 
steady-state  is  within  an  acceptable  tolerance  of  the  continuous 
covariance . 

The  steady-state  discrete  prediction  covariance  S  and  the  filtering 
covariance  must  converge  to  the  continuous  p  of  (6)  in  the  limit  as 
A  -»•  0.  Accordingly,  we  write  S  as  a  function  of  P  and  A/x  as 


sn4>-pnS(f-1>- 


Table  2.  Summary  of  Discrete 
Linearized  Kalman-Bucy  Filter 

Phase  Process  Model 
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Observation  Model 
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Table  2.  Continued. 


Predictor  Model 


x(n  |  n-1)  =  4>x(n  |  n)  x(o|o)  =  o 


S(n)  =  *  Pd(n-1)*’  +  TQdr ’ 


Sll(a)  S12(n) 


S12(n)  S22(n) 


Filter  Model 


A  A 
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Sn(n+1) 


Table  2.  Continued. 

Equilibrium  Solution 

rd[sn(a)  +  2  s12(n)A]  -  S2lz(n)^  _  2 

Sn(n)  +  rd  “  +  S22^ 
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Setting  S(n+i)  -  S(n)  =  S,  using  a  discrete  form  of  the  Bass-Roth 
Theorem  [1], 


S  = 


(t-1) 


A/q 


d"d 


£  h-  _  i' 

A  c.  \ 


t'qr 


fix  y 

/oT 


iai 


~  &<i 


{/a ~ 


with 


Vz 


a  =  1  +  -  +  2_L  _  ± 
o  %  2  2 


—  +  6p  +  ( 4+p  ^  m 7  ~ 


y  1  Vo 


+  4p 


n  -  d  A4 
P--  . 


194 


S  (-)  =  P  -i- 
12  T  12  . -  > 

/a 

o 


S22<7>  =  P 


22 


(14) 


(15) 


where  is  a  function  of  — ,  since  the  p  of  Table  2  may  be  written 

/A..  ,  /A.  b 

P  (~)  “  A  (-)  . 

The  filtering  steady  state  may  be  similarly  expressed  as 
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The  relationship  between  A/x  and  the  discrete  variances  is  illustrated 
in  Figs.  2-4.  If  we  choose  A/x  =  0.1  and  evaluate  (13)  and  (17),  we 
find 

Sn  (0.1) 

-  =  1.108028  , 


P. 


while 


p.  (0.1) 

dll  _ 

P11 


=  0.9070263  , 


or  S  (0.1)  -  P,  (0.1)  =  0.2  P,  .  Thus  we  suffer  a  10%  change  in 
1  dll 

the  steady-state  covariance  by  taking  10  samples  per  time  constant. 
In  any  case  we  will  compare  all  discrete  filters  *-o  each  other,  and 
we  can  assume  that  all  results  lie  within  10%  of  their  continuous 


limits . 


SAMPLING  INTERVAL  TO  FILTER  TIME  CONSTANT 


RATIO  OF  SAMPLING  INTERVAL  TO  TIME  CONSTANT 


SAMPLING  INTERVAL  TO  TIME  CONSTANT 
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C.  Application  of  Nonlinear  Filtering 


The  application  of  Bayes-Law  filtering,  developed  by  Bucy  and 
Senne  [5],  requires  special  considerations  for  each  special  case.  In 
the  problem  of  this  paper,  for  example,  the  dimension  of  the  driving 
noise  is  one  less  than  that  of  the  state  vector,  a  situation  for  which 
Bucy  and  Senne  indicate  there  results  in  a  computational  simplification 
for  Bayes  Law  implementation.  We  will  explore  that  claim  here  for  the 
second  order  phase  process. 

At  the  outset  consider  a  similar  but  unrealistic  phase  process 
with  two  driving  noises  u1  and  u2  so  that  (in  discrete  time) 

x  (n  +  1)  =  4>  x  (n)  +  F  u  (n) ,  (20) 

^eA  O' 

.0  A  _ 

and  the  remaining  quantities,  as  well  as  the  sensor  model  remain  the 

same  as  described  in  Table  2.  Now  if  we  let  the  variance  of  u  and  u, 

2  1 

be  q/A  and  take  the  limit  as  e  ->■  0  then  the  process  (20)  will  be 
identical  to  the  one  described  in  Table  2.  For  any  finite  e,  however, 
the  probability  density  function  of  the  driving  terms  T  u  will  take 
the  form 

1 


where 


u  = 


LU2 


pru  ^ 


-  exp  | -  j  |  !_5j  |  ,| 

2  it  det[A]  j  2  A-1  j 


(21) 


where 


a  =  rQrT  = 


|e”Aq  0 
0  Aq 

Similarly,  the  density  of  the  observation  noise  will  be  denoted 
Pv(£),  which  will  be  gaussian  with  covariance  R. 


If  we  review  the  Bayes  representation  theorem  solution  to  the 
discrete  filtering  problem  [4],  we  can  determine  that 

Jn«|n  <*>  "  i t  f f  V*  -  *i)Jn|n«d,1ldX2  • 

^  -CO  -00 

and 
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Now  if  we  take  the  limit  as  e  +  0  in  (22)  we  obtain 
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where  C  =  (2tt)  ^(qA)  In  other  words  th?  Bayes  integral  reduces 

to  an  integral  over  a  sufcspace  of  the  state  space  with  dimension  equal 
to  that  of  the  driving  noise  vector  (equals  one  in  this  case) .  The 


complete  Bayes  recursion  may  now  be  written 


|n<I>  " 


f  2qA 


(y2"x2>  Jn|nl 


l"X2A 


Jn|n<Z>  “  C2exPj-  ~[(zrcos  yx)2  +  (z2-sin  y !>2] J »  (26) 

where  both  C1  and  C2  are  normalizing  constants. 

Although  there  can  be  no  doubt  that  a  significant  simplification 

has  resulted  from  reducing  the  number  of  integrations  required  for  the 

Bayes  integral  computation,  it  must  be  pointed  out  that  the  arguments 

of  J  1  in  (25)  must  be  determined  to  lie  in  a  particular  subspace  of 
njn 

R2.  If  the  densities  have  been  stored  only  at  a  finite  number  of  points 


» 
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then  some  form  of  interpolation  of  the  densities  will  be  necessary, 
resulting  in  some  overhead,  so  that  the  computation  is  not  exactly 
equivalent  to  a  scalar  problem. 

Another  problem  of  concern  to  this  particular  application  is  the 
domain  of  the  conditional  probability  densities.  If  we  are  only 
interested  in  modulo-2ir  errors  for  the  reasons  stated  earlier  then 
there  is  no  loss  of  generality  to  map  all  modulo-2iT  intervals  of  the 
conditional  densities  bade  into  the  [-ti,  it)  interval  in  the  coordinate 
and  sum  up  the  individual  contributions.  Similarly,  because  we  are 
computing  the  phase  in  discrete-time  with  sample  time  A,  we  observe 
that  phase-rate  contributions  oucside  the  interval  [— ~  ,  ~)  will  have 
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the  same  effect  on  the  next  phase  as  their  modulo—^-  component. 
Accordingly,  we  may  map  the  phase-rate  x?  components  of  the  conditional 
densities  back  into  [--  ,  for  the  same  reason  as  above.  The 
combination  of  the  two  mappings  results  in  an  equivalence  between  the 
"cyclic"  state  space  and  a  torus  (see  Fig.  5). 

Next  we  consider  what  simplifications  arise  for  the  Bayes  integral 
update  (25) -(26)  as  a  consequence  of  the  cyclic  mapping  of  the  state 
space.  We  begin  by  combining  (25)  and  (26)  into  a  3ingle  equation 
representing  the  filter  update,  and  absorb  all  non  state-dependent 
terms  of  the  quadratic  exponent  expansions  info  a  single  normalizing 
constant  Cq,  giving 


where 


,  (Z,(n+i)cos  y,  +  Z.,(n+l)sin  y, 

A  _  I  .  1  1 

Vi'3’’  =  Vxp  | 

'  r/A 


<, 


(28) 
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If  we  now  modulate  the  density  Jn|n  as  described  above,  we  obtain 

ulYiiu., 


/o  +  27ik' 


(29) 


T  + 


k--«  l=-co  \-  A  / 

with  Tr<q<7r,  and  —  1  T  <  J  •  Finally  we  use  the  definition  (29)  on 

Jn+1 1 n+1  *  311(1  substitute  (27),  resulting  in  the  following  manipulations  [2]; 

n+l|n+1C)"  ??  W°+2’k>/  **f  }~iU±iii/-A~ll)2jj„| (30) 


k  1 


/ 


=  Sn+1<‘> 


EEE/ 


JL+2™ 

A  A 


exp 


k  1  m 


-it  27im 

r~r 


i-(T+2nl/A-u)2)  /o-pA+27ik\ 

l  2qA  JJn|n^  ^ 


(let  5  -  M  -  1H) 
A 


jl 

,  ,  V 2^  jJn|n  (  ,  « 

k  1  m  -7T  ’  \  r  .  2  Tim  / 

A  \  ?+~  / 

(let  i  =>  1-m,  j  =  1-i) 


i 

* 


=  Sn+1< 


'  ’  ‘  \  2iri 


i  j  k  -jr 
A 


(Fubini’s  theorem  to  interchange  52  and J" ) 


‘  i.«W/‘  a<T-{)  2n|„  f  t4)»« 

-7T  \  S  / 


Ol) 


“IT 

A 


« +¥ 
A 


203 


where 


co 


(32) 


The  result  (31)  represents  the  exact  recurrence  relation  required 

for  the  cyclic  conditional  density.  The  individual  terms  of  a(*)  will 

drop  rapidly  on  either  side  of  t-£,  depending  only  on  the  variance  qA. 

Having  constructed  a  density  function  updating  formula  for  the 

phase  estimation  problem,  the  question  remains  -  what  form  shall  the 

phase  estimate  itself  take?  It  is  clear  that  the  conditional  mean  (the 

goal  of  the  phase-locked  loop)  is  an  admissible  candidate.  The  cost 

criterion  that  is  minimized  by  the  conditional  mean,  however,  is  the 

mean  squared  error.  It  is  not  obvious  that  mean  squared  error  is  the 

best  criterion  for  choosing  estimates  modulo  2tt .  In  fact,  a  periodic 

cost  function  of  the  form 

L(e)  -  2(1  -  cos  e)  (33) 

might  be  more  appropriate  than  e2  if  only  modulo-2Ti-errors  are  important. 

The  cyclic  loss  (33)  looks  like  e2  for  smell  e  and  (e-2kiT)2  for  e  close 

to  2ku  for  all  k.  Moreover  we  may  easxly  show  that  the  estimate  x* 

n 

which  minimizes 

7t  JL 

E(L(x*-x^)!Zn]  =J  j  A  L(T-xJ)3n(T,o)dodT  '  (34) 

-IT 

A 


is  given  by 


x*  =  tan_1|E[sin(xn)|ZnjyE[cos(xn)|Zn]| 


(35) 


For  a  proof  of  (35)  see  [7]. 

Of  course  the  cyclic  loss  may  not  be  the  only  desire  ;le  loss  function 


for  the  phase  demodulator.  But  many  other  proposed  criteria  would  be 
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minimized  by  appropriate  functions  of  the  conditional  phase  densities, 
resulting  in  extremely  great  flexibility  for  the  experiment  designer. 
Moreover,  it  is  seen  that  the  conditional  expected  loss  (34)  as  well  as 
the  densities  themselves  are  computable  in  addition  to  the  estimates  so 
that  a  considerable  amount  of.  quantitative  information  is  available  to 
provide  a  realistic  assessment  of  the  quality  of  the  estimates,  regardless 
of  the  cost  criterion  employed.  No  such  information  is  provided  by  the 
phase-locked  loop  -  especially  after  steady-state  is  attained. 


205 


References 


[1]  M.  Abramowitz  and  I. A.  Stegun,  Handbook  of  Mathematical  Functions, 
Dover,  New  York,  1965, 

[2]  R.S.  Bucy,  "Building  and  Evaluating  Non-Linear  Filters,"  To  appear 
Proc,  Symp.  on  Appl.  Math.;  Stochastic  Diff.  Eqns.,  Am.  Hath.  Soc., 
April  1972. 

[3]  R.S.  Bucy,  C.  Hecht,  and  K.D.  Senne,  "Optimal  Phase  Demodulation 
via  Discrete  Nonlinear  Filtering,"  Air  Force  Weapons  Laboratory 
Computer  Films  No.  72-0401-01,  April  1972. 

[4]  R.S.  Bucy  and  P.D.  Joseph,  Filtering  for  Stochastic  Processes  with 
Applications  to  Guidance,  Wiley  Interscience,  New  York,  1968. 

[5]  R.S.  Bucy  and  K.D.  Senne,  "Digital  Synthesis  of  Nonlinear  Filters," 
Automatica,  7  (1971),  278-298. 

[6]  C.  Hecht,  "Synthesis  and  Realization  of  Nonlinear  Filters,"  Ph.D. 
Dissertation,  University  of  Southern  California,  January  1972. 

[7]  A. J .  Mallinckrodt,  R.S.  Bucy,  and  S.7.  Cheng,  "Final  Report  for  a 
Design  Study  for  an  Optimal  Nonlinear  Receiver-Demodulator,"  NASA 
Contract  NAS5-10789,  Goddard  Space  Flight  Center,  Maryland,  1970. 

[8]  A. J .  Viterbi,  Principles  of  Coherent  Communication,  McGraw  Hill, 

New  York,  1966. 


207 


Appendix  A.  Numerical  Experiments  With  The  Phase-Locked  Loop 

In  order  to  provide  an  accurate  check  on  the  value  of  the  nonlinear 
filters,  it  was  necessary  to  perform  extensive  Monte  Carlo  tests  on  the 
phase-locked  loop  (i.e.  the  steady-state  linearized  filter).  Since  the 
discrete  phase-locked  loop  operates  very  fast  (over  1000  estimates  per 
second  on  the  CDC  6600),  it  was  possible  to  average  estimates  over 
5000  sample  paths  of  length  130  in  10  minutes.  If  three  time  constants 
are  discarded  (30  samples)  in  each  sample  path  the  resulting  100 
estimates  represent  steady-state.  If  all  of  the  steady-state  errors 
were  averaged  this  would  lead  to  500000  monte  carlos  of  the  steady- 
state  error.  On  the  other  hand,  since  adjacent  errors  are  correlated, 
the  effective  Monte  Carlo  length  would  better  be  set  between  N=50000 
(one  estimate  per  time  constant)  and  N=500000  (every  estimate  included). 
Thus,  we  may  determine  the  three-standard  deviation  confidence  bands 
based  on  both  values  of  N. 

The  independent  parameter  for  the  different  phase-locked  loop 
cases  considered  was  p^ ,  the  steady-state  continuous  Ricatti  equation 
solution  for  the  phase  error  variance,  pj  j  =/~2rV4q1/1*  is  shown  by 
Viterbi  [8]  to  be  the  inverse  of  the  effective  signal  to  noise  ratio, 
or  N/S.  The  initial  condition  for  the  matrix  Ricatti  equation  can  be 
taken  to  be  its  sceady-state  value*,  and  the  mean-squared  error  will 

*An  alternative  initial  condition  of  Pi](0)=4Pn  was  used  in  Appendix 
B,  where  it  was  discovered  that  there  is  a  significance  dependence 
between  the  initial  condition  and  the  effective  time  constant  of  the  loop. 
Thus  the  time  require'  to  reach  steady-state  from  the  large  initial 
condition  is  frequently  too  long  for  an  effective  Monte  Carlo  analysis. 

Preceding  page  blank 
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then  only  be  a  function  of  M/S  and  not  of  q.  A  value  of  0.01  was 

chosen  for  q,  and  thus  the  value  of  r  was  taken  to  be  (pj  (22'^3q1  ^3) . 

Now  if  the  phase-locked  loop  were  linear  with  H= [ — 1  0],  then  p^ 

would  be  the  steady-steady  mean-squared  error.  An  equivalent 

interpretation  would  be  to  consider  p  as  the  state-state  mean- 

U 

squared  error  of  a  filter  operating  on  the  linear  measurements  z  =  Hx  +  v 
The  latter  interpretation  verifies  that  in  all  cases  the  mean-squared 
error  of  the  actual  loop  can  be  no  lower  than  p^.  If  we  are 
interested  in  modulo-27r  errors  we  may  convert  the  lower  bound  into 


mean-modulo-2Tr-squared  error  by  the  equation 

_P2 


CJ2 
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•£[ 


■/ei 


-TT 


(e+?r)  mod  2m 

- (e+2ku ) 2 


de 


exp 


2p 


11 


de, 


(A-l) 


where  o2  A  mean-modu.lo-2-squared  error.  The  result  o2  of  (A-l) 
m  “  m 

is  plotted  in  Fig.  A-l  as  well  as  the  equivalent  discrete  filter 
result  2  (based  on  P,  ),  ctj  was  determined  by  the  Newton 
integration  based  on  1U00G  points  in  the  interval  [-tt,ti)  and  15 
standard  deviations  taken  from  the  infinite  sum. 

The  deviation  the  Monte-Carlo  performance  of  the  phase-locked 
loop  from  the  ideal  (linear)  analysis  can  also  be  seen  in  Fig.  A-l 
to  be  insignificant  below  N/S  =  -lOdb  or  above  +8db.  The  significant 
departure  in  the  middle  region  is  a  result  of  the  "cycle-slip" 
phenomena,  whereby  the  unmodulated  phase-error  density  begins  to 
contain  substantial  probability  in  the  secondary  modes  [8].  For 
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extremely  high  noise  situations,  however,  modulation  of  the  error 

density  causes  the  asymptotic  error  density  to  become  uniform  on 

2/ 

[—it , it)  ,  resulting  in  an  asymptotic  variance  of  tt  /  3=  3.29  (or  5.19 
dB)  for  N/S  large. 

The  point  at  which  the  -lOdB  departure  between  the  two  curves 

occurs  is  often  referred  to  as  "threshold",  where  unlock  of  the  loop 

begins  to  cause  problems.  Almost  all  engineering  modifications  to 

the  basic  loop  are  designed  for  the  purpose  of  extending  the  threshold. 

It  is  clear  that  all  nonlinear  filters  must  have  modulo-27t  error 

variance  in  the  region  between  the  two  curves  (discrete  phase-locked 

loop  and  discrete-ideal).  Thus  the  problem  of  threshold  extension  is 

equivalent  to  reducing  steady-state  error  variance,  which  attains  the 

maximum  potential  improvement  of  about  4dB  at  -1.8dB  N/S. 

The  confidence  in  the  Monte  Carlos  of  the  phase-locked  loop  can 

be  computed  using  the  results  from  Chapter  IV.  Accordingly,  we  observed 

2 

that  a  value  for  =  p^/p2  must  be  determined,  since  the  three-standard 

deviation  confidence  bounds  on  variance  depend  on  p2*  Thus  our  Mcnte 

Carlo  simulations  involved  the  calculation  of  an  estimate  p.  of  y 

4  4 

and  the  estimate  u2  of  u2>  Then  we  plotted  the  ratio  ~  C>2  in 

Fig.  A-2.  From  the  figure  we  determine  that  a  good  upper  bound  on 
p?  is  given  by  5.4,  which  is  achieved  near  -5.0  dB  N/S.  A  summary  of 
the  estimate  3o  confidence  bands  is  given  in  Table  A-l,  where  we 
observe  that  the  maximum  confidence  band  width  is  from  i0.038  dB 
to  J-0.120  dB,  depending  on  whether  N  was  taken  as  500000  (the  actual 
number  of  errors  averaged)  or  50000  (an  approximation  to  the  equivalent 
number  of  independent  errors  used).  Thus,  the  Monte  Carlo  experiment 
of  the  phase-locked  loop  is  more  than  good  enough  for  a  reliable 


bench-mark  performance. 
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Table  A-l 

Confidence 

Intervals  for 

Linearized  Filter 

N/S(dB) 

H2 

P 

2  Pf 

Confidence 
N=50000  N= 

=5(00$00 

-14.0 

.0372 

.00424 

3.06 

±0.0828 

±0.0264 

-12.0 

.0599 

.0114 

3.16 

±0.0848 

±0.0270 

-10.0 

.0993 

.0321 

3.26 

±0.0867 

±0.0276 

-8.00 

.173 

.125 

4.18 

±0.103 

±0.0327 

-6.00 

.343 

.632 

5.^7 

±0.120 

±0.0383 

-5.19 

.480 

1.24 

5.38 

±0.120 

±0.0384 

-5.00 

.520 

1.43 

5.29 

±0.119 

±0.0380 

-4.59 

.606 

1.83 

4.98 

±0.115 

±0.0366 

-4.01 

.747 

2.53 

4.53 

±0.108 

±0.0345 

-3.01 

1.05 

4.23 

3.84 

±0.0971 

±0.0309 

-2.50 

1.17 

4.95 

3.62 

±0.0933 

+0.0297 

t 

-2.00 

1.36 

6.08 

3.29 

±0.0873 

±0.0278 

-1.94 

1.39 

6.26 

3.24 

±0.0863 

±0.0275 

•-1.00 

1.67 

8.02 

2.88 

±0.0792 

±0.0252 

0.00 

1.92 

9.67 

2.62 

±0.0735 

±0.0234 

1.00 

2.14 

11.1 

2.42 

±0.0689 

±0.0219 

2.00 

2.36 

12.6 

2.26 

±0.0649 

±0.0206 

3.00 

2.53 

13.9 

2.17 

±0.0626 

±0.0199 

4.00 

2.67 

14.8 

2.08 

±0.0601 

±0.0191 

5.00 

2.77 

15.6 

2.03 

±0.0587 

±0.0187 

6.00' 

2.87 

16.4 

1.99 

±0.0576 

±0.0183 

8.00 

3.04 

17.6 

1.90 

±0.0549 

±0.0174 

10.00 

3.12 

18.1 

1.86 

±0.0537 

±0.0171 

Appendix  B.  Numerical  Experiments  With  The  Hermite  Polynomial  Expansion 

The  objective  of  the  numerical  methods  investigated  in  this  re¬ 
search  were  to; 

1)  devise  methods  to  realistically  (small  computation  time  per  esti¬ 
mate)  solve  the  nonlinear  filter  problem  using  contemporary  computers, 
and 

2)  to  demonstrate  that  a  practical  nonlinear  filter  can  be  constructed 
to  take  advantage  of  its  inherent  accuracy  as  compared  to  a  linear 
filter.  Both  objectives  were  achieved  using  the  methods  of  the  previous 
sections  and  demonstrated  with  the  programs  given  (Hecht  [6]). 

The  initial  numerical  data  given  is  that  generated  at  the  University 
of  Southern  California,  using  an  IBM  360-65- computer,  and  reported  in  [6], 
The  subsequent  data  was  generated  at  Kirtland  AFB,  New  Mexico,  using  a 
CDC  6600  computer  and  is  reported  here  for  the  first  time. 

The  first  goal,  reduced  computation  time,  was  demonstrated  by  com¬ 
parison  to  Bucy  and  Senne  [5],  where  a  two-dimension  problem,  roughly 
equivalent  to  the  phase-lock  problem,  was  solved.  In  the  Bucy  and 
Senne  paper  sophisticated  techniques  were  used  to  reduce  the  number  of 
computations  per  estimate,  reducing  the  computation  time  by  a  factor  of 
200.  The  reduced  number  of  computations,  per  estimate,  was  approximately 

3 

13x10  .  For  the  filter  using  Hermite  expansions,  described  in  this 

2 

paper,  the  number  of  computations  per  estimate  was  18  x  10  .  However, 
the  critical  calculation,  the  evaluation  of  an  exponential  function, 
had  to  be  done  only  100  times  per  estimate.  It  would  be  indicated, 
therefore,  that  a  time  improvement  of  a  factor  of  130  could  be  expected. 
The  Bucy  and  Senne  paper  gave  results  using  a  Burroughs  B5500  computer, 
and  had  a  measured  time  per  estimate  of  45  seconds.  The  Hermite 
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numerical  results,  given  in  this  chapter,  were  obtained  using  an 
IBM  360  Model  *5  computer.  For  the  Monte  Carlo  experiments  described 
in  this  Appendix  there  were  203  sample  functions,  each  consisting  of 
130  points,  which  ran  in  120  minutes,  or  approximately  0.273  seconds 
per  estimate.  Assuming  the  Burroughs  B5500  was  approximately  equiva¬ 
lent  to  the  IBM  360-65,  there  was  a  measured  improvement  of  165  times. 
The  measured  time  on  the  IBM  360-65  was  just  about  what  was  predicted 
in  the  referenced  paper  for  a  parallel  processing  computer,  if  such  a 
computer  should  become  available  at  a  future  date. 

The  second  goal,  simulation  and  demonstration  of  accuracy,  was 
accomplished  by  comparing  the  nonlinear  filter,  described  in  Section  C, 
with  the  relinearized  filter  described  in  Section  B.  The  Monte 
Carlo  results  obtained  were  possible  only  because  of  the  efficiency  of 
the  Hermite  method.  In  the  following  descriptive  material  the  filter 
of  Section  B  is  referred  to  as  the  "linear"  filter,  and  that  of  Section 
C  as  the  "nonlinear"  filter.  Both  the  linear  and  the  nonlinear  filters 
were  designed  to  estimate  the  phase  angle  for  the  phase  coherent 
communications  problem. 

As  indicated  in  the  discussion  of  Section  B,  the  estimate  of  the 
phase  angle  is  required  modulo  2n.  The  construction  of  both  filters 
was  such  as  to  attempt  to  track  the  absolute  phase  angle.  When 
evaluating  the  filters,  however,  the  error,  modulo  2it,  was  the  value 
used. 

The  parameter, 

a  rs 

Pll  =  E[  (xj-xj)  (t)|z  ,v  =-»,t] 

was  the  independent  parameter  for  all  comparisons.  The  variances  of 
the  message-model  and  the  observation  noise  were  related  to  pu  as 
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was  shown  in  Section  B.  Selections  of  the  numerical  values  for  the 
initial  experiments  is  given  in  Table  B-l.  Viterbi  [5]  shows  the 
parameter  p^(0)  is  aiso  the  inverse  of  the  effective  signal  to  noise 
ratio,  N/S.  Thus,  p^(0)  had  the  two  physical  interpretations: 

1)  equilibrium  error  variance. 

2)  effective  noise  to  signal  ratio. 

A  preliminary  test  was  made  to  verify  that  both  the  linear  and 
nonlinear  filters  were  working  properly.  For  small  p  one  would 
expect  both  filters  to  give  equal  results.  A  sample  function  of  130 
points  (13  filter  time  constants)  was  generated  for  p  ^2  =  ,1,  .01, 
and  .004;  the  computer  listings  were  given  in  Hecht  [3].  The  following 
results  were  noted: 

1)  The  sequence  of  estimates  for  the  linear  and  nonlinear  filter  agree 

with  each  other  to  about 

10  2  radian  for  p  V2  =  .1 
11 

10  ^  radian  for  p  =  .01 

11 

10  radian  for  p  J/2  =  .004 

11 

2)  The  measured  variances  and  errors  agree  to  within  the  same  preci¬ 
sion  as  the  estimates. 

3)  The  equilibrium  computed  and  measured  variances  for  the  two  filters 
agree  with  each  other  and  with  the  linear  computed  value  using  the 
equations  of  Section  B. 

The  nonlinear  filter  was  tested  at  p  =  .55  (p  =  .3025,  N/S  = 

11  11 

-5.2  db)  where  the  difference  between  the  measured  and  theoretical 
linear  variance  was  3.5  db.  The  nonlinear  filter  simulation  test  at 
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Table  B-l 


Numerical  Values  Used  for  Computer  Simulations 

A  =  time  between  samples 

F  =  filter  time  constant 

=  0.1 

P  (0)  =  equilibrium  continuous  linear  position  error  variance 

11 

^22(0)  =  ecluilibrium  continuous  linear  velocity  error  variance 

E(x1(0))  =  0 

E(x2(0))  =  0 

E(Xl2(0))  =  4Pn(0) 

E(x22(0))  =  4P22(0) 

q  =  continuous  message  driving  variance  =  0.01 

Number  of  points  in  each  dimension  for  Gauss-Hermite  numerical 
integration  =  10 


Highest  order  of  terms  in  series  expansion  of  density  function  =  5 
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N/S  =  5.2  was  under  the  same  conditions  as  the  linear  filter,  i.e., 
same  noise  sequences,  but  only  200  sample  functions.  The  phase  error 
variance  for  the  nonlinear  filter  at  these  conditions  was  -3.55  db,  or 
1.45  db  better  than  the  linear  filter.  This  point  is  also  shown  in 
Figure  B-l. 

The  cumulative  average  variance  was  plotted  for  both  filters  to 
show  the  stabilization  of  the  average  as  a  function  of  number  of 
sample  functions,  Figure  B-2.  Points  are  plotted  for  every  fifth 
sample  function  up  until  sample  function  number  100,  then  one  final 
point  at  sample  function  number  200. 

Large  errors  for  both  the  linear  and  nonlinear  filters  for  the 
phase  angle  problem  are  due  to  the  phenomena  of  "cycle  slippage," 
which  occurs  more  frequently  as  N/S  gets  larger.  The  improvement 
in  performance  of  the  nonlinear  filter  was  due,  primarily,  to  the 
ability  of  the  nonlinear  filter  to  reduce  the  number  of  cycle  slips. 
Sample  function  number  6  was  identified  as  one  in  which  the  linear 
filter  slipped  several  cycles  whereas  the  nonlinear  filter  held  on. 

The  sequence  of  estimates  and  errors  for  both  filters  for  this  sample 
function  were  analyzed  and  a  portion  of  sample  function  number  6 
(from  about  N  =  15  to  52)  is  t'bown  in  Figure  B-3  and  the  absolute 
value  of  the  error,  modulo  2n, is  plotted  in  Figure  B-4.  The  figures 
show  the  linear  filter  has  slipped  a  cycle  at  N=35,  and  is  slipping 
a  second  cycle  at  N=50.  The  nonlinear  filter,  at  the  same  time,  has 
a  large  error  at  N=35  but  appears  to  recover  nicely  and  by  N=50 
it  is  tracking  very  well. 


ASYMPTOTIC 


\T2  Fr/4Cr  ~  EFFECTIVE  N/S  IN  DB 


NUMBER  OF  SAMPLE  FUNCT 


DISCRETE 


Portion  of  Sample  Function  No 
P, , (o)  =  0.3025 


\ 


os  ^ 

sj*  —  ^ 

r— 

Ll! 

CD  h- 
K)  UJ 
(Z 
U 
in 
CM  q 


avd-^z  oinaoiM 
d0dd3  jo  3mvA  3imosev 


TIME  IN  FILTER  TIME  CONSTANTS 


222 

It  can  be  seen  that  the  measured  variance  in  Figure  B-l  is 
significantly  different  than  the  corresponding  curve  given  in 
Appendix  A  (Figure  A-l).  Investigations  showed  that  the  initial 
conditions  affected  the  variance,  for  substantially  longer  than  3 
time  constants,  as  was  originally  assumed.  The  variances  given 
in  Figure  B-l  were  based  on  the  variance  of  the  initial  estimate  being 
four  times  the  equilibrium  variance,  whereas  Figure  A-l  was  based  on 
the  initial  variance  set  equal  to  the  equilibrium  variance. 

To  demonstrate  that  the  equilibrium  solution  could  be  achieved 
independent  of  the  starting  condition,  one  long  sequence  was  run 
(83,000  points)  starting  at  four  times  the  equilibrium  value.  This 
is  shown  in  Figure  B-5,  where  markers  are  inserted  to  show  the  result 
for  the  two  curves  previously  mentioned  for  the  conditions  given  on 
the  graph.  After  about  8000  time  constants  (80,000  points)  the 
cumulative  average  appears  to  be  approaching  the  solution  given  in 
Figure  A-l,  which  was  based  on  5000  Monte  Carlo  sample  functions  of 
130  points  each,  starting  at  the  equilibrium  value. 

With  this  new  knowledge,  the  Hermite  nonlinear  filter  was  again 
evaluated  with  400  Monte  Carlo  functions  with  the  same  parameters 
except  that  the  starting  values  were  the  computed  equilibrium  values. 
This  new  sequence  was  compared  to  the  phase-lock  results  for  an 
identical  set  of  sequences.  The  cumulative  errors  for  the  two  filters 
are  plotted  in  Figure  B-6,  in  a  manner  similar  to  Figure  B-2.  Both  the 
phase-lock  and  the  nonlinear  filter  had  smaller  errors  than  before. 

The  phase-lock  for  400  functions  was  -3.10  db  and  the  nonlinear  filter 
was  -4.06  db,  for  an  improvement  of  0.96  db.  Because  of  the  larger 
number  of  samples  the  3a  confidence  probability  improved  to  about 
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±0.07  for  both  filters.  It  was  noted  that  the  phase-lock  error 
for  400  functions  was  0.49  (-3.10  db)  as  compared  to  0.50  (-3.0  db) 
for  5000  functions,  or  within  .02  (.01/. 49  =  .0204)  of  what  might  be 
considered  the  "correct"  error. 

The  above  data  was  generated  on  the  CDC  6600  computer,  and  it  was 
noted  the  nonlinear  filter  (Hermite  expansion)  computed  the  estimates 
at  a  rate  of  .127  Sec/estimate,  which  was  more  than  two  times  faster 
than  the  previous  computer. 

In  conclusion  we  note  that  a  practical  two-dimensional  nonlinear 
filter  was  simulated  using  a  digital  computer.  The  digital  filter  er¬ 
ror  variance  was  within  approximately  10%  of  the  continuous  model. 

Using  Gauss-Hermite  integration  and  Hermite  Series  expansions  the  non¬ 
linear  filter  computed  solutions  to  a  phase  angle  problem  at  the  mea¬ 
sured  rate  of  0.273  seconds  per  estimate  on  a  medium  speed  contemporary 
computer  and  .127  seconds  per  estimate  on  a  high  speed  computer,  which 
was  165  times  faster  than  a  roughly  equivalent  problem  using  the  most 
advanced  digital  techniques  prior  to  this  paper.  The  phase  angle 
problem  solved  was  a  model  of  an  existing  type  of  communications 
receiver  which  presently  uses  linearization  methods  to  handle  the  non- 
linearities.  The  simulated  nonlinear  filter  using  Hermite  expansions 
showed  an  error  variance  reduction  of  .96  db  at  moderately  high  noise 
to  signal  ratio,  with  greater  reductions  at  higher  noise  levels  shown 
for  other  nonlinear  filter  methods.  (See  the  discussion  in  the 


following  Appendices.) 
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Therefore,  only  those  integers  were  used  where 

Max  F(y  ,n)  >  ~  10"20  . 
y2.n 

The  program  that  was  developed  makes  the  above  test  on  F(y2,n),  and  in 
all  of  the  results  to  date  only  the  value  v2  =  0  has  been  found  to  be 
significant. 

F(y2,n)  is  not  a  function  of  n,  the  integer  time,  and  therefore 
was  computed  only  one  time  in  advance  and  stored  for  use  in  (C-l)  for 
all  n.  A  further  simplification  was  made  by  taking  advantage  of  the  fact 
that  F(y2>n)  is  only  a  function  of  y2-q.  That  is,  after  discretizing 
the  argument  y2>  F(y2,n)  is  computed  for  a  range  of  values  of  y2~n, 
rather  than  for  all  combinations  of  y2  and  n.  In  the  sequel  we  let 
the  running  variable  n  be  called  x*. 

The  2-dimensional  interval 

—IT  <_  Xj  <  TT  , 


is  divided  into  m  and  n  equally  spaced  sub-intervals,  respectively, 
and  each  sub-interval  is  defined  by  a  point  on  its  center,  yt  ,  y2  , 

i  j 

with  i-  1, . . .,mand  j=  1, . .  .,n.  The  points  y  and  y  defines 

i  j 

grid  which  remains  constant  with  respect  to  time,  and  the  density 
function  is  represented  by  point  masses  defined  only  on  this  grid,  where 
the  magnitude  of  the  point  masses  approximates  the  density  at  that  point. 
From  (C-l),  the  points  x*  =  y„  (j  =  l,n);  that  is,  for  each  integer 

j  j 

j,  the  two  grid  points  are  identical. 


The  integrand  in  (C-l)  needs  to  be  evaluated  only  for  those  x* 

2j 

where  Max  F(y2  ,  x*  )  >  approximately  10“20.  For  computing 

y  2  -X2  ^  j 

i  j 

./M 

an  updated  density  function  J  the  integrand  needs  to  be 


evaluated  only  a  small  number  of  times  for  each  y  (typically  10 

,  c  Zi 

times  for  n  =  200) . 

The  main  difficulty  in  the  mechanization  of  (C-l)  is  in  deteraining 

-  b\  ~4  A\ 

Jn-l  (  1  J  )*  having  the  prior  value  of  this  function  available  only 


at  a  discrete  set  of  in  points  in  the  first  argument,  different  than 
those  defined  by  y  -x*  A. 

i  j 

One  approach  to  solving  this  problem  is  as  follows.  Let 

xit  ■  -  * +  2”  ^  + 1  (t )  1  = 


(C-2) 


j  =  l.n 


as  defined  above. 
Now, 


-  -x*  t 

h  S 


=  -7T 

+  2tt 
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°  2ir 

(±=i. 

-  m 

n  / 

=>  2ti 

(~- 
\  m 

r  i .  2u  (,1-D 

,  1 
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[  A  +  A  n“ 

h2 

(— )J 

(C-3) 


We  want  to  use  the  integers  i  and  j  to  define  a  new  grid  point 
k,  on  the  x^  axis;  the  k  grid  point  must  agree  with  an  original  Xj 
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grid  point.  That  is,  from  (8)  and  (9)  we  want 


x  =  y  —x  A 
‘k  >i  £j 


or 


-* +  2*  (¥)  - 1  (v) ' 2j  [(v-)  -  (■‘r)] +  *  (s  -  i)  <c-4) 


from  which 


i  j  m  .  .  1  /m  ,  \ 


(C-5) 


In  the  initial  approach  to  this  problem,  when  k  as  computed  from 
(C-5)  v/as  not  an  integer,  the  nearest  integer  value  was  selected.  In 
order  to  assure  an  x^  grid  point  falling  exactly  on  Xj  =  0  m  must 
be  an  odd  integer.  It  is  also  desirable  to  have  k  be  correct  (an 
integer  value  from  (C-5))  when  the  x*  grid  point  j  is  in  the  center 
of  its  range,  requiring  n  to  be  an  odd  number.  To  meet  the  above 
requirements,  and  to  subdivide  such  that  other  points,  k,  might  match 
exactly  with  some  i,  the  ratio  ^  should  be  an  odd  integer,  which 
also  assures  that  no  k  point  will  fall  exactly  in  the  middle  of  two 
adjacent  i  points. 

A  modification  was  made  to  the  formula  for  computing  k,  to 
account  for  the  possibility  of  k,  as  determined  from  (C-5),  not  falling 
in  the  range  of  (l,m) . 

For 


k>ra  k*  =  k  -  m 

k<l  k*=k+m 


( C— 6) 


Equation  (C-6)  is  equivalent  to  folding  back  on  the 
remain  within  the  Interval  [ — n , tt ) . 
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The  above  technique  works  satisfactorily  but  requires  a  large 
number  of  grid  points  to  stabilize  on  the  "true"  nonlinear  estimate, 
necessitating  substantial  computer  expense.  To  test  for  convergence, 
a  fixed  noise  sequence  was  generated  and  the  filter  was  used  to  estimate 
the  state  with  the  number  of  grid  points  progressively  increased  until 
increasing  the  number  of  points  caused  no  change  in  the  sequence  of 
estimates.  It  was  subsequently  learned  that  the  number  of  points 
needed  for  convergence  varied  as  a  function  of  the  N/S  ratio. 
Substantial  data  was  generated  with  the  above  method.  Fig.  C-l  shows 
the  phase  error  variance  as  the  upper  of  the  two  curves,  based  on 
Monte  Carlos  of  200  independent  sample  paths  of  100  steady  state  points. 
The  3a  confidence  intervals  represent  2000  points  (one  each  time 
constant)  . 

An  improvement  which  significantly  reduced  the  computational 
requirements  was  to  let  k,  from  (11),  take  on  non-integer  values. 

The  value  of  the  density  function  at  these  places  was  evaluated  by 
linear  interpolation  between  the  two  adjacent  integer  values  of  k, 
where  the  density  function  was  available  from  the  prior  cycle.  The 
interpolation  was  required  only  in  the  Xj  direction,  since  the 
densities  in  the  x2  coordinate  direction  are  computed  at  exactly  the 
places  where  they  are  needed  for  the  recursion  formula.  For  inter¬ 
polating  between  k  =  m  and  k  =  1,  the  points  m  and  1  were 
considered  adjacent,  completing  a  circle.  The  convergence  test 
described  above  was  applied  to  the  modified  filter  and  stabilized  for 
a  substantially  smaller  number  of  grid  points.  The  lower  curve  of 
Fig.  C-l  was  generated  using  the  filter  wit’  interpolation. 
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The  two  curves  show  the  same  filter  error  variance  at  about 
N/S  =  0  dB,  which  was  in  the  vicinity  of  the  N/S  ratio  where  the 
convergence  tests  were  made  prior  to  introducing  the  interpolation. 

For  lower  N/S  the  interpolating  filter  shows  lower  errors,  being  about 
one  dB.  less  at  about  N/S  *  -4  dB.  The  numerical  granularity  associated 
with  non-interpolation  apparently  causes  significant  errors  at  the  lower 
N/S  ratios,  due  to  the  sharpness  of  the  phase  error  density  function. 

At  higher  N/S  ratios,  the  filter  error  density  is  so  diffuse  that  the 
numerical  errors  are  of  no  consequence. 

The  numerical  results  of  the  Monte  Carlo  experiments  for  the  cyclic 
point-mass  filter  are  given  in  Table  C-l.  Table  C-2  gives  the  associated 
improvement  of  the  cyclic  filter  over  the  phase-locked  loop  results 
reported  in  Appendix  A,  and  Table  C-3  gives  the  difference  between  the 
nonlinear  filters  and  the  ideal  linear  analysis.  In  all  cases  the  minus 
and  plus  3a  confidence  intervals  are  given,  where  for  2000  points  the 
lower  threshold  is  -0.394  dB  below  nominal  and  the  upper  threshold  is 
0.433  dB  above  the  nominal. 

Table  C-l.  Monte  Carlo  Mod  2ir  Error  Performance  Data  for 
the  Cyclic  Point  Mass  Estimates 


N/S 

(dB) 

Non-In terpolative 

MSE  (dB)  (Mod  2n) 

Low  Norn.  High 

Interpolative 

MSE  (dB)  (Mod  2  it) 

Low  Nom.  High 

-4.01 

-2.20 

-1.81 

-1.38 

-3.22 

-2.83 

-2.40 

-3.01 

-1.33 

-0.94 

-0.51 

-1.87 

-1.48 

-1.05 

-1.79 

-0.09 

0.30 

0.73 

— 

— 

— 

-1.70 

_ 

- - 

- - 

-0.38 

0.01 

0.44 

0.00 


1.46 


1.85 


2.28 
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Table  C-2.  Monte  Carlo  Improvements 
Cyclic  Point-Mass  over  Phase-Locked  Loop 


N/S 

(dB) 

Non-Interpolative 

MSE  (dB)  Improvement 

Low  .  Nom.  High 

MSE 

Low 

Interpolative 
(dB)  Improvement 
Nom.  High 

-4.01 

0.33 

0.76 

1.15 

\ 

1.75 

1.35 

1.78 

2.17 

-3.01 

0.93 

1.36 

1.47 

1.90 

2.29 

*-1.79 

1.02 

1.45 

1.84 

— 

— 

— 

**-1.70 

— 

— 

— 

1.41 

1.84 

2.23 

0.00 

0.75 

1.18 

1.57 

_ 

— 

*  Phase-Locked  Performance  read  from  graph  (1.75  dB) 
**  Phase-Locked  Performance  read  from  graph  (1.85  dB) 


Table  C-3.  Monte  Carlo 

Difference 

Between 

Cyclic  Point-Mass  and  Ideal  Linear 

N/S 

Non-Interpolative 

Interpolative 

(dB) 

MSE 

(dB)  Difference 

MSE  (dB)  Difference 

-4.01 

2.23 

2.62  3.05 

1.21  1.60 

2.03 

-3.01 

2.10 

2.49  2.92 

1.56  1.95 

2.38 

-1.79 

2.13 

2.52  2.95 

-  - 

— 

-1.70 

— 

-  - 

1.75  2.14 

2.57 

0.00 

1.9 

2.29  2.72 

-  - 

— 

The  summary 

of  these 

data  in  Fig.  C-l  is 

shown  with  the  phase-locked 

loop  performance  and  the  Idealized  Linear  reference  curve.  Fig.  C-2 
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shows  the  improvements  over  the  phase-locked  loop  and  Fig.  C-3  shows 
the  difference  between  nonlinear  and  ideal.  In  all  figures  the  Hermite 
point  from  Appendix  B  is  superimposed. 

In  addition  to  determining  the  Monte  Carlo  performance,  a  study 
was  done  to  determine  the  execution  times  for  various  grid  sizes  in  an 
effort  to  obtain  a  cost  versus  performance  comparison. 

The  results  of  Fig.  C-l  were  obtained  with  a  grid  of  m  =  21  and 
n  =  105.  The  number  p  is  evaluated  by  the  program  for  eacn  problem 
condition. and  represents  the  number  of  points  on  each  side  of  the  point 
(x  ,x2  )  in  the  x2  direction,  which  contribute  to  the  computation  of 

i  j 

the  density  of  that  point. 

An  estimate  of  the  time  required  to  update  the  density  function  was 
based  on  the  knowledge  that  the  time  was  roughly  proportional  to  the 
number  of  computations.  For  each  m,  n  it  requires  m  x  n  x  p  computa¬ 
tions.  The  measured  data  for  the  21  x  105  grid  case  was  about  0.215 
seconds  per  estimate  (as  compared  to  the  5-coefficient  Hermite  value  of 
0.121  seconds  per  estimate).  Using  the  above  scaling,  the  data  of 
Table  C-4  was  generated. 

To  determine  an  adequate  grid  size  many  runs  were  made  with  the 
same  random  sequence  inputs,  using  different  grid  size  combinations. 

All  runs  were  made  with  N/S  =  -1.7  dB,  Filter  Time  constant  =  5.13 
seconds,  samples  per  time  constant  =  10, /.q  -  .010  and  r  =  1.735.  The 
sequence  of  estimates  for  a  ten-increment  time  period  (integer  time  = 

31,  40)  were  compared  with  each  other.  It  was  desired  to  find  a  combina¬ 
tion  which  gave  reasonably  good  results  (as  compared  to  larger  grids) 
while  minimizing  the  time  per  estimate.  Table  C-5  gives  3  sequences 
with  the  grid  size  progressively  increasing  while  maintaining  the  ratio 

* 


X 


Difference  from  Ideal  Linear  Analysis 
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n/m  =  5.  Table  C-6  maintains  n  ■=  155  while  varying  the  ratio  n/m 
from  3  through  5.  Table  C-7  maintains  m  *  31  while  varying  the  ratio 
n/m  from  5  through  7.  -Analyzing  the  absolute  error  sequences  from 
Tables  C-5  through  C-7  we  note  that  for  n/m  =  5,  and  increasing  grid 
size,  the  errors  become  smaller  as  the  grid  size  increases,  with  very 
little  change  between  m  =  31  and  n  =  41,  and  insignificant  differences 
between  m  =  21  and  n  =  41.  For  n  =  155,  Table  C-6,  and  for  m  =  31, 
Table  C-7,  the  same  general  trend  was  observed.  That  is,  slight  but 
insignificant  improvement  with  increasing  grid  size.  For  Table  C-7, 
especially,  there  appears  to  be  no  advantage  in  increasing  m.  Tables 
C-5  and  C-6  suggest,  however,  that  ultimate  stabilization  can  be 
achieved  by  increasing  m  with  n  about  155.  The  choice  of  a  21  x  105 
grid  for  the  Monte  Carlo  experiments  was  based  on  a  compromise  between 
time  and  accuracy,  as  illustrated  in  the  Tables. 

Table  C-4.  Timing  Estimates 


Grid 

P 

m/n 

Time/Est 

Sec. 

•21  x 

105 

4 

5 

.215 

31  x 

155 

6 

5 

.700 

41  x 

205 

8 

5 

1.63 

31  x 

93 

4 

3 

.279 

31  x 

125 

5 

4 

.468 

31  x 

155 

6 

5 

.700 

31  x 

195 

8 

6 

1.17 

31  x 

217 

9 

7 

1.46 

21  x 

145 

6 

7 

.442 

25  x 

151 

6 

6 

.547 

31  x 

155 

6 

5 

.700 

39  x 

155 

6 

4 

.875 

51  x 

155 

6 

3 

1.15 

number 

of  lines  in 

X 

direction 

1 

n  =>  number  of  lines  in  x  direction 

2 

2p  +  1  =  number  of  computations  to  update  each  grid  point 


Table  C-5.  n/m  Constant 
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.215  Sec. 

.700 

Sec. 

1.63 

Sec. 

Time 

m  = 

21 

m  = 

31 

m  = 

41 

Est. 

n  = 

105 

n  = 

155 

n  = 

205 

Time 

X1 

A 

X 

1 

lv*il 

A 

X 

1 

A 

X 

1 

Ixj-x 

31 

-2.394 

3.094 

.795 

3 . 100 

.789 

3.106 

.783 

32 

-2.296 

-3.085 

.789 

-3.081 

,785 

-3.075 

.779 

33 

-2.244 

-2.146 

.098 

-2.216 

.028 

-2.234 

.010 

34 

-2.220 

-1.905 

.315 

-1.943 

.277 

-1.955 

.265 

35 

-2.210 

-2.146 

.064 

-2.163 

.047 

-2.172 

.038 

36 

-2.221 

-2.903 

.682 

-2.880 

.659 

-2.868 

.647 

37 

-2.222 

-1.908 

.314 

-1.912 

.310 

-1.921 

.301 

38 

-2.198 

-1.246 

.951 

-1.274 

.924 

-1.287 

.911 

39 

-2.162 

-1.312 

.850 

-1.330 

.832 

-1.343 

.819 

40 

-2.195 

-1.894 

.301 

-1.866 

.329 

-1.868 

.327 
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Table  C-6.  n  Constant 


n  =  155 


.700 

Sec. 

.875  Sec. 

1.15 

Sec. 

m  = 

31 

m  =  39 

m  = 

51 

E  - 
m 

5 

A 

n  .  . 

~  “  4 

m 

n  _ 

m 

3 

Time 

X1 

lvxil 

*1 

K-xJ 

X1 

|xrx 

31 

3.100 

.789 

3.105 

.734 

3.106 

.783 

32 

-3.081 

.785 

-3.077 

.781 

-3.076 

.780 

33 

-2.216 

.028 

-2.232 

.012 

-2.246 

.002 

34 

-1.943 

.277 

-1.954 

.266 

-1.960 

.260 

35 

-2.163 

.047 

-2.172 

.038 

-2.177 

.033 

36 

-2.880 

.659 

-2.870 

.649 

-2.867 

.646 

37 

-1.912 

.310 

-1.920 

.302 

-1.925 

.297 

38 

-1.274 

.924 

-1.287 

.911 

-1.290 

.908 

39 

-1.330 

.832 

-1.343 

.829 

-1.347 

m 

C-J 

co 

40 

-1.866 

.329 

-1.871 

.324 

-1.870 

.323 
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Table  C-7.  m  Constant 


m  =*  31 


.700 

Sec  • 

1.17  Sec. 

1.46 

Sec. 

m  « 

155 

m  -  195 

m  ~ 

217 

n 

- —  cs 

5 

—  =  6 

n  * 

—  t; 

7 

m 

m 

m 

A  1 

A  . 

A  |  /V  | 

A 

Time 

X1 

Iv^l 

X1 

X1 

|xrx 

31 

3.100 

.789 

3.100 

.789 

3.101 

.788 

32 

-3.081 

.785 

-3.081 

.785 

-3.081 

.785 

33 

-2.216 

.028 

-2.216 

.028 

-2.212 

.024 

34 

-1.943 

.277 

-1.943 

.277 

-1.942 

.278 

35 

-2.163 

.047 

-2.163 

.047 

-2.162 

.048 

36 

-2.880 

.659 

-2.879 

.658 

-2.878 

.657 

37 

-1.912 

.310 

-1.912 

.310 

-1.912 

.310 

38 

-1.274 

.924 

-1.274 

.924 

-1.275 

.923 

39 

-1.330 

.832 

-1.330 

.832 

-1.331 

.831 

40 

-1.866 

.329 

-1.867 

.328 

-1.869 

.326 
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Appendix  D»  A  Fourier  Series  Experiment 


Mallinckrodt,  Bucy,  and  Cheng  [7]  have  observed  the  fact  that  since 
the  cyclic  phase  density  is  periodic,  a  Fourier  Series  appears  appro¬ 
priate  for  representation  of  the  density  functions.  They  have  developed 
equations  for  the  evolution  of  the  Fourier  Series  for  the  one  dimensional 
problem.  We  extend  their  analysis  to  our  two-dimensional  problem  and 
present  the  preliminary  results  of  a  numerical  experiment  in  this 
appendix. 

We  begin  by  observing  that  an  arbitrary  function  J(y)  periodic  on 

7T  IT 

the  rectangle  -ir  <_  Xj  <  it,  -  —  <_  x2  <  may  be  represented  in  terms 
of  its  two  dimensional  Fourier  Series 


^  /yA  V*  V*  n  +im  y,  +i*Ay0 

Jn  ~  L  L  am*  6  1  e  2  * 

VJ  m  £ 


(D-l) 


where 


e-iUy2  dyidy2 


(D-2) 


Now  the  cyclic  density  obeys  the  recursion  relation 

£ 

-  /m  .  .  r  V  /vx  2A\ 


HJ '  s(Vi)l  m(v*2M  V )  dx> ' 


(D-3) 


where 


(  z  cos  y  +  z  sin  y  ) 
S(y,)  -  C0  exp  j - J75 - j  . 
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given  by 
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Using  the  equations  (D-12)  -  (D-14)  and  (D-16)  we  have  implemented 
an  example  of  the  Fourier  Series  filter  with  “5  <_  m  <_  5  and  -5  £  l  £  5 
for  a  total  of  11  x  11  a  121  coefficients.  Due  to  the  occurrences  of 
negative  mass  we  discovered  that  the  Fourier  Series  is  not  suitable  for 
low-noise  situations.  On  the  other  hand,  for  N/S  =  1  dB  and  q  =  0.1 
we  managed  to  get  quite  good  results.  This  may  be  seen  in  Fig.  C-l, 
where  the  result  of  the  raean-modulo-2TT-squared  error  of  the  Fourier 
Series  is  shown  in  conjunction  with  the  experimental  results  for  the 
phase-locked  loop  (Appendix  A) ,  the  Hermite  Expansion  (Appendix  B) ,  and 
the  point-mass  representation  (Appendix  C) .  The  nominal  Monte  Carlo 
result  for  the  Fourier  Series  at  N/S  =  1  dB  was  2,65  dB  with  a  200 
Monte  Carlo  3o  confidence  from  2.26  dB  to  3.08  dB.  This  result  is 
equivalent  to  a  nominal  Improvement  over  the  phase- locked  loop  of  0.84  dB 
with  3a  confidence  from  0.41  dB  to  1.23  dB.  Also,  the  difference 
between  Fourier  Series  performance  and  the  ideal  linear  was  nominally 
2.12  dB  with  confidence  interval  from  1.73  dB  to  2.55  dB. 

Although  the  above  Monte  Carlo  result  was  consistent  with  the 
previous  experiments  it  represents  a  preliminary  result,  since  we  still 
have  not  completely  isolated  a  solution  to  the  negative  mass  dilemma. 

More  results  will  follow  in  a  later  paper. 
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Appendix  E.  A  Movie  of  Conditional  Densities 

Just  as  we  have  demonstrated  the  value  of  visual  inspection  of 
conditional  densities  for  the  tracking  problem  in  the  past  (Chapter  VI), 
we  now  illustrate  the  wealth  of  information  contained  in  the  conditional 
densities  for  the  phase  demodulator.  In  this  appendix,  we  describe  a 
movie  made  of  cyclic  phase  densities  using  the  point-mass  method  described 
in  Appendix  C. 

The  parameters  used  for  the  sequence  in  the  movie  were  as  follows: 

N/S  =  0  dB  ,  q  =  0.1 
A  -  0.24 

The  initial  density  was  chosen  with  the  nominal  steady-state  value 
P  =  1.85  dB  (1.53),  and  the  grid  size  was  set  at  31  points  in  phase 
by  155  points  in  phase-rate.  The  isometric  views  of  the  densities, 
seen  in  Fig.  E-l,  are  shown  for  phase  over  the  entire  interval  [— tt,tt)  , 
but  phase  rate  is  shown  over  only  one  third  of  the  interval 
Thus  spillover  in  the  phase  rate  direction  is  not  lost,  but  merely  not 
shown.  In  the  initial  sequence  from  the  movie  [3],  shown  in  Fig.  E-l, 
many  features  may  be  observed.  Cycle  slips  in  phase  are  accompanied  by 
general  turbulence  of  the  density,  the  appearance  of  multiple  modes  and 
other  anomolies.  One  explanation  for  the  appearance  of  multiple  modes 
and  thu3  the  cycle  slips  is  the  occasional  major  disagreement  between 
t  .e  in-line  and  quadrature  measurement  components  z1  and  z2»  as  a 
result  of  the  independent  noises  v^  and  The  sequence  shown  in 

the  figure  illustrates,  though,  how  recovery  is  gradually  reaccomplished 
when  the  measurements  agree. 


Preceding  page  blank 


250 


Fig.  E-l.  A  Typical  Sample  Path  of 
Densities  Evolving  in  Time 


The  following  15  pages  constitute  Fig.  E-l.  The  densities  are 
taken  from  the  initial  condition  and  59  conditional  a  posteriori 


densities . 
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VIII.  Conclusions 

The  purpose  of  this  report  has  been  to  review  the  extensive 
research  concerning  numerical  methods  of  implementing  optimal  Bayes 
estimators  during  the  past  two  to  three  years.  It  is  anticipated  that 
the  contents  of  this  report  will  represent  only  a  small  fraction  of 
the  important  work  on  the  problem  in  the  next  few  years.  This  pre¬ 
diction  is  based  in  part  on  the  increasing  number  of  researchers  who 
have  turned  to  the  subject  during  the  last  two  years. 

We  interpret  the  effort  described  herein  as  a  pioneering  venture. 

We  have  not  been  able  to  answer  or,  in  fact,  ask  every  possible  question 
during  this  research.  We  have  demonstrated  the  present  day  feasibility 
of  implementing  excellent  numerical  approximations  to  nonlinear  filters. 
We  have  further  discussed  how  this  effort  can  be  extended  -  both  in  the 
realm  of  machine  design  and  in  the  area  of  practical  engineering 
application. 

Almost  any  example  involving  only  a  few  state  dimensions  would  be 
appropriate  for  this  investigation.  We  require  a  small  state  vector 
since  the  evaluation  of  nonlinear  estimators  requires  costly  Monte  Carlo 
simulations  -  a  topic  which  we  covered  in  great  detail  in  Chapter  IV0 
Another  advantage  of  a  small  state  vector  (less  than  or  equal  to  two 
dimensions)  is  the  accessibility  '  f  the  conditional  probability 
densities  to  visua1  inspection  via  computer  graphics.  We  have  demon¬ 
strated  through  the  use  of  two  movies  the  important  advantage  of  visual 
inspection  of  the  conditional  densities  evolving  in  time.  This  of 
course  emphasizes  the  point  that  the  nonlinear  estimate  is  only  a 
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byproduct  of  this  research  -  the  principal  concern  is  the  conditional 
density,  which  we  have  available  for  inspection  at  every  sample  time. 

It  would  be  incorrect  to  surmise,  however,  that  we  do  not  respect 
the  value  of  moment  series  expansions,  as  exemplified  by  the  extended 
Kalman-Bucy  filter,  for  example.  Rather,  we  accept  past  successes  at 
face  value  -  as  a  testimony  to  engineering  perseverance  and  diligence 
and  we  intend  to  supplement  these  efforts  with  some  sound,  systematic 
solutions  co  the  general  nonlinear  estimation  problem  which,  hopefully, 
will  provide  increased  understanding  of  the  characteristics  of  the 
optimal  estimator  in  a  variety  of  applications. 

At  the  same  time,  however,  it  would  be  pessimistic  to  consider  our 
methods  beyond  practicality,  or  non  "real-time".  We  have  no^ed 
frequently  in  the  present  report  that  only  development  time  lies 
between  the  present  ideas  and  their  real-time  realizations  in  digital 
machines.  The  fundamental  reason  for  our  certainty  for  this  conjecture 
is  the  existence  of  essentially  unlimited  parallelism  in  the  Bayes-Law 
computations.  We  contend  furthermore,  that  pilot  experiments  on  existing 
parallel  machines,  yet  to  be  performed,  will  prove  our  conjecture  that 
the  parallelism  in  these  algorithms  may  be  exploited  indefinitely,  that 

4' 

perhaps  Bayes  estimation  is  truly  the  first  non-academic  application  of 
parallel  machine  architecture. 

Of  course  one  important  question  which  we  have  raised  concerns  the 
identification  of  suitable  example  problems  which  illustrate  the  unique 
characteristics  of  nonlinear  estimators.  It  is  obvious  that  our  efforts 
would  be  partially  unnecessary  if  it  were  clear  at  the  outset  what 
examples  were  appropriate.  We  have  presented  two  important  applications 
for  the  theory,  picked  by  and  large  at  random,  problems  which  represent 
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some  of  the  most  significant  challenges  to  nonlinear  filtering.  The 
first  problem  concerns  passive  tracking,  a  subtle  application  of 
filtering  to  a  system  involving  a  singular  nonlinearity  and  all  of 
the  potential  numerical  instabilities  that  result.  The  second  applica¬ 
tion  involves  communication  theory  in  the  sense  that  the  limitation  on 
current  application  of  demodulators  revolves  around  the  reliability  of 
the,  phase-locked  loop.  We  have  shown  that  threshold  extensions  of 
several  dB  are  possible  by  routine  application  of  optimal  nonlinear 
estimators.  In  addition  we  have  demonstrated  the  advantages  of  engineer¬ 
ing  the  error  loss  function  and  of  examining  the  conditional  expected 
loss  for  each  sample  sequence,  resulting  in  considerable  performance 
feedback  during  the  operation  of  the  filter. 

We  expect,  in  summary,  tnat  our  results  described  in  this  report 
will  be  seminal  to  an  expanding  development  of  experiments  and  research 
efforts  in  the  field  of  synthesizing  and  evaluating  nonlinear  estimators. 
We  anticipate,  further,  that  respect  for  and  understanding  of  numerical 
approximation  methods  will  continue  to  develop  in  both  among  applied 
engineers  and  the  mathematicians  concerned  with  nonlinear  estimation. 

It  is  in  this  spirit,  then,  that  we  c?fer  the  report  as  a  manual  -  an 
engineer's  guide  to  building  nonlinear  estimators. 
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